# Thread: Music enhancement through statistical pattern recognition and approximation

1. But does Fourier decomposition give you individual instruments? I thought it just gives you individual sines.

The transform I propose will, of course, work better on a better quality signal, and best on an analog signal.
Yes, you should start with a better quality signal, not with a MP3 stream.

3. The next question is whether video streams can be compressed with some kind of pattern recognition. There are half-working algorithms able to reconstruct a 3D-scene from a photograph. For each scene in a movie a 3D-scene would be reconstructed (much easier with multiple photographs, especially if the camera is moving) and the textures would be extracted and only the movement of the vertices saved. The compression ratio would be very high.

4. Light sources. If you don't want to save one texture per frame per object (something I imagine might take more space than the video itself), you have to analyze the lighting of the scene, so you can correctly recreate the old look.

5. Originally Posted by CornedBee
Light sources. If you don't want to save one texture per frame per object (something I imagine might take more space than the video itself), you have to analyze the lighting of the scene, so you can correctly recreate the old look.
Yes, and this will probably be very hard when both the objects and the light sources are moving. In extreme cases there could be a simpler algorithm to fall back to (eg. MPEG).

6. >>> But does Fourier decomposition give you individual instruments?

No, but then, you don't need the individual instruments. If you extract a bunch of ratty sines, correct the sines, and put them back together, you are re-assembling the signal as a whole.

If you consider a "piece of music" where all of the "instruments" were sine wave generators, then you will be extracting the "instruments", correcting them, and reassembling the piece.

If however, one of the instruments is a square wave generator, you will not be extracting the square wave, but the series of sines that compose the square wave. Correcting those sines will still correct the square wave, but you will never have had the square wave isolated as an "instrument".

The problem, as always, in dealing with a digitised signal is that the waveform you see is "stepped". This, in fourier analysis terms adds a vast number of higher frequency components to the signal. These will need some very intelligent filtering, but could be done as long as the sample frequency of the original piece is known.

lets say you have a musical piece sampled at 128Kb/s
then you have the same piece sampled at 512Kb/s

The 512Kb/s file is the master file on a server while the 128Kb/s file is on the client machine.

The idea is to stream the missing data to the client to enhance the sound to the same quality as the 512Kb/s recording but not incure the space penalties on the client machine or necessarily have the original track even on the client.

The data would be synchronized in assembled real time (or close as you could get).

8. Why not just load the whole thing in memory off the server and not have the client copy? Wouldn't this put about the same strain on both machines?

9. Digital rights suppose the artist doesn't want his master tracks copied.

10. If you're streaming from a central server, and the people hosting the server don't want you copying their music (which I think is what you mean), there would be ways around that.

You could just have a cable go from your soudn card's audio out to the audio in and record the song that way. You could also get a program that can write directly from the sound card to a sound file on your harddrive. And quality loss would not be much of an issue there if they're streaming at that high.

What would be the difference between the master tracks and a copy of the master tracks, besides a little quality loss?

11. If you streamed the master as deltas, they would be kind of useless on their own ... hmm ...

12. What are deltas in relationship to streamed music?

Ultimately the master track should be an Analog recording. I was just thinking a very high bitrate master file that could be used to fill in gaps on the clients copy so they could get a true studio quality sound.

As far as the digital rights, some people will not even allow their music to be recorded so what do you do then?

13. If those gap fillers submitted are only offsets to the last value available on the client, the data amount is the same (or perhaps even less), and someone without the client file couldn't use the server file. You could say the master file is encrypted using the client file as the key.

Of course, that would require a new media format. MP3 can't do that.

14. I was thinking on similar lines but did not want to get off topic with encryption. Besides I have some moral quanandrums with putting the idea out their in relationship to music.

The original post is more about artifical intelligence and musical analysis. So I am just going to drop it and go talk to my chat bot Desti.

Have a good day people psi a nar a

15. How about something along the lines of...

All music has repetition. Some less than others, some more. Be it a bass line, drum beat, etc. So some sort of midi form where you can tell the difference between instruments/channels would be great first step to simplicity.

Then, you look for the patterns. Consider the whole sequence for one channel:
0.0 8.1 4.2 6.9 9.5 7.8...
Not much of a pattern.

But if at one point during the song, you find the pattern:
0.0 GAP GAP 6.9 GAP 7.8
And at another point"
GAP 8.1 GAP GAP 9.5 7.8
And one more:
0.0 8.1 4.2 GAP GAP GAP

And you can find these overlapping similarities, you can fill in the gaps based on the information from the other portions. Of course, perhaps you could allow a leeway of 0.1 in your integers, so if the only thing holding you back from filling in a GAP is because one pattern says 4.2 and another says 4.3 (drum was hit in a different spot, but still relatively the same output), you could still count it as valid and perhaps average the two.

Of course, something like singing will yield less patterns than a bass line, but it could be somewhere to start.

But if you can fill this pattern for multiple channels, then recombine them into a single file...well, I have no idea what would happen, or how you would split them up in the first place.

Even Winamp comes with an Equalizer that controls the behaviour of different decibel levels in relation to another property (not sure what it is, but the units are K, and it may or may not be labeled preamp). But either way, there are a variety of things to reconstruct. The decibel levels, perhaps the frequency of the sounds if you can analyze that, etc.

That's my best idea for a reconstruction method.