PDA

View Full Version : Music enhancement through statistical pattern recognition and approximation



ChadJohnson
01-02-2006, 05:18 PM
Suppose you were to record a song and then convert it to MP3 at say 128kbps. The quality would relatively poor. Well do you think it would be possible for a piece of software to up the quality by statistically filling in the gaps through pattern recognition and approximation (which is done through the use of a neural network)? How much do you think you could improve the quality through such a method?

Sang-drax
01-02-2006, 05:36 PM
Absolutely! But it would require some serious research and would probably end up in a Ph.D. thesis. Ideally, the computer would recognize the different instruments and save the song in a MIDI-like format.

ChadJohnson
01-02-2006, 05:48 PM
Why a MIDI-like format? Could the inference mechanism be good enough to save it as an MP3?

sean
01-02-2006, 07:25 PM
When he refers to a MIDI-like format, I believe, he's referring to your inferred goal of a perfect reproduction of what was originally recorded.

MP3 is a lossy compression format, so I don't know why you think this makes a good target format when you're trying to fill in gaps in data. The ultimate goal as I see it would be to have a program recognize the different parts of the music and produce new data accordingly. MIDI is far too corse for this, but it's the same sort of idea - not storing just sound waves, but separating the sound into more useable information.

CornedBee
01-02-2006, 07:53 PM
so I don't know why you think this makes a good target format when you're trying to fill in gaps in data.
I think you misunderstood him. He wants to make up for the compression loss by algorithmically recreating the sound.

Which, given the nature of MP3 compression, I'm not sure is even logically possible.

ChadJohnson
01-02-2006, 09:41 PM
Yea, I don't know much about the MP3 format, but I guess I was thinking of just a general sound format. I picked MP3 because that's what all my music is in (legally). Maybe WAV or something else would work better.

But yea, I was thinking the sound file would be loaded into memory, so you'd have the raw data to work with (after decoding it of course), you'd fill in the gaps algorithmically, and then you'd write a brand new file.

gcn_zelda
01-02-2006, 10:00 PM
Why not just have the MP3 readers do that? That way the MP3 file sizes and compression would stay the same, and it would still sound better quality?

CornedBee
01-02-2006, 10:02 PM
The problem with this is the irregularity of natural music. Let me try to illustrate what I mean.

Pattern recognition usually aims to recognize patterns based on complete data, and then to either take action based on what was recognized, or continue the pattern. One of the simplest forms of pattern is the number sequence. I'm sure you'll have no problems recognizing and continuing these patterns (even Excel can do that):

1 2 3 4 5 6 7 8

1 3 5 7 9 11

1 2 4 8 16 32
The nice thing about these sequences is that they are regular. That makes it even possible to fill in gaps:

1 3 5 7
In fact, they are so simple that they can be described by a formula.

e(i) = e(i-1)+1

Continuation is generally easier than filling in the gaps, though.

1 9 2 8 3 7 4 6 5 5 6 4
This is simple to continue. Can you fill in the gaps if I remove every second element, though? Without actually knowing how the sequence works?

1 2 3 4 5 6

Already the data loss has been significant, and what's worse, irrecoverable. But music isn't even that nice. Music is completely irregular.

1 5325 39 6436 35 6 235 2 32 6136013 35 92
You couldn't continue that sequence even if you tried to. That's because it's just random numbers. The problem is, there is a certain amount of randomness in natural music, too. You don't pick a guitar twice exactly the same way. You don't hit the drum in exactly the same place twice. You don't hit the same tone with your voice twice.

When you have two samples of audio, there's not really much you can do to find out what's between them. So you recognized it as part of a guitar sound? Great! If you have an exact sound profile of the guitar in question, being played the way it is in this song, with the exact effects settings, you can use this information to find the exact place within a sound sample that your two samples are at and interpolate between them using that information. Only, you also have to consider that base in the background, and the drums obscuring the sound, and of course that guy with the raspy voice singing some weird made-up lyrics. Oh, you have all the samples for those, too? Great, why not play that instead of the low-quality song?
You get where I'm aiming? Natural music does not follow any patterns you can recognize, thus the only way to enhance low-quality data is to feed high-quality data into it. But if you can waste all that storage on high-quality data, you might as well get high-quality data in the first place.

ChadJohnson
01-02-2006, 10:45 PM
Good points. I guess you could not take the average between two points right next to each other and fill it in that way either.

CornedBee
01-03-2006, 06:14 AM
Well, you could, but the enhancement would be minimal, if any.

adrianxw
01-03-2006, 06:23 AM
>>> Well, you could, but the enhancement would be minimal, if any.

It could be totally negative in effect. Consider...

1 10 3 8 5 6 7 4

... chopped to...

1 3 5 7

... and "filled" as suggested...

1 2 3 4 5 6 7

... totally wrong.

The placing of averaged points between existing points tend to flatten/soften the sound of music, it does not enhance it. It tends to sound like the listener is wearing thick woolen earmuffs.

A way to recover lost data points may be to fourier out all of the sine curves comprising the signal, then insert extra data points into the various curves before re-assembling the signal. I've never tried that. Music has a lot of sudden notes and chords appearing/going, so, particulaly for low frequecy components, adding extra data points before or after they should appear will tend to "smear" the sound in and out.

I can see that working well for a single note from an instrument however.

CornedBee
01-03-2006, 06:42 AM
Which requires you to do MIDI analysis first, which tends to work well only with high-quality music, and of course reduces sound quality by itself.

anonytmouse
01-03-2006, 07:27 AM
Well do you think it would be possible for a piece of software to up the quality by statistically filling in the gaps through pattern recognition and approximation?

Isn't this what compression algorithms already do?

CornedBee
01-03-2006, 07:52 AM
No, not really. Formats like MP3 or Vorbis analyze the data and first cast out everything the human ear doesn't hear properly anyways. This greatly simplifies the data, making it easier to apply other compressions to it.

adrianxw
01-03-2006, 07:57 AM
>>> Which requires you to do MIDI analysis first,

People were doing fourier decomposition of waveforms and music years before MIDI was invented. I was fiddling with it myself in the same year the MIDI standard was first proposed, let alone finalised.

The transform I propose will, of course, work better on a better quality signal, and best on an analog signal.

CornedBee
01-03-2006, 08:06 AM
But does Fourier decomposition give you individual instruments? I thought it just gives you individual sines.

Sang-drax
01-03-2006, 08:09 AM
The transform I propose will, of course, work better on a better quality signal, and best on an analog signal.
Yes, you should start with a better quality signal, not with a MP3 stream.

Sang-drax
01-03-2006, 08:16 AM
The next question is whether video streams can be compressed with some kind of pattern recognition. There are half-working algorithms able to reconstruct a 3D-scene from a photograph. For each scene in a movie a 3D-scene would be reconstructed (much easier with multiple photographs, especially if the camera is moving) and the textures would be extracted and only the movement of the vertices saved. The compression ratio would be very high.

CornedBee
01-03-2006, 08:20 AM
Light sources. If you don't want to save one texture per frame per object (something I imagine might take more space than the video itself), you have to analyze the lighting of the scene, so you can correctly recreate the old look.

Sang-drax
01-03-2006, 08:29 AM
Light sources. If you don't want to save one texture per frame per object (something I imagine might take more space than the video itself), you have to analyze the lighting of the scene, so you can correctly recreate the old look.
Yes, and this will probably be very hard when both the objects and the light sources are moving. In extreme cases there could be a simpler algorithm to fall back to (eg. MPEG).

adrianxw
01-03-2006, 08:45 AM
>>> But does Fourier decomposition give you individual instruments?

No, but then, you don't need the individual instruments. If you extract a bunch of ratty sines, correct the sines, and put them back together, you are re-assembling the signal as a whole.

If you consider a "piece of music" where all of the "instruments" were sine wave generators, then you will be extracting the "instruments", correcting them, and reassembling the piece.

If however, one of the instruments is a square wave generator, you will not be extracting the square wave, but the series of sines that compose the square wave. Correcting those sines will still correct the square wave, but you will never have had the square wave isolated as an "instrument".

The problem, as always, in dealing with a digitised signal is that the waveform you see is "stepped". This, in fourier analysis terms adds a vast number of higher frequency components to the signal. These will need some very intelligent filtering, but could be done as long as the sample frequency of the original piece is known.

curlious
01-04-2006, 11:27 PM
How about this idea.
lets say you have a musical piece sampled at 128Kb/s
then you have the same piece sampled at 512Kb/s

The 512Kb/s file is the master file on a server while the 128Kb/s file is on the client machine.

The idea is to stream the missing data to the client to enhance the sound to the same quality as the 512Kb/s recording but not incure the space penalties on the client machine or necessarily have the original track even on the client.

The data would be synchronized in assembled real time (or close as you could get).

ChadJohnson
01-05-2006, 01:07 AM
Why not just load the whole thing in memory off the server and not have the client copy? Wouldn't this put about the same strain on both machines?

curlious
01-05-2006, 02:19 AM
Digital rights suppose the artist doesn't want his master tracks copied.

ChadJohnson
01-05-2006, 03:40 AM
If you're streaming from a central server, and the people hosting the server don't want you copying their music (which I think is what you mean), there would be ways around that.

You could just have a cable go from your soudn card's audio out to the audio in and record the song that way. You could also get a program that can write directly from the sound card to a sound file on your harddrive. And quality loss would not be much of an issue there if they're streaming at that high.

What would be the difference between the master tracks and a copy of the master tracks, besides a little quality loss?

CornedBee
01-05-2006, 05:09 AM
If you streamed the master as deltas, they would be kind of useless on their own ... hmm ...

curlious
01-05-2006, 10:08 AM
What are deltas in relationship to streamed music?
What are you thinking about?

Ultimately the master track should be an Analog recording. I was just thinking a very high bitrate master file that could be used to fill in gaps on the clients copy so they could get a true studio quality sound.

As far as the digital rights, some people will not even allow their music to be recorded so what do you do then?

CornedBee
01-05-2006, 10:13 AM
If those gap fillers submitted are only offsets to the last value available on the client, the data amount is the same (or perhaps even less), and someone without the client file couldn't use the server file. You could say the master file is encrypted using the client file as the key.

Of course, that would require a new media format. MP3 can't do that.

curlious
01-05-2006, 10:23 AM
I was thinking on similar lines but did not want to get off topic with encryption. Besides I have some moral quanandrums with putting the idea out their in relationship to music.

The original post is more about artifical intelligence and musical analysis. So I am just going to drop it and go talk to my chat bot Desti.

Have a good day people psi a nar a

Epo
01-05-2006, 11:22 PM
How about something along the lines of...

All music has repetition. Some less than others, some more. Be it a bass line, drum beat, etc. So some sort of midi form where you can tell the difference between instruments/channels would be great first step to simplicity.

Then, you look for the patterns. Consider the whole sequence for one channel:
0.0 8.1 4.2 6.9 9.5 7.8...
Not much of a pattern.

But if at one point during the song, you find the pattern:
0.0 GAP GAP 6.9 GAP 7.8
And at another point"
GAP 8.1 GAP GAP 9.5 7.8
And one more:
0.0 8.1 4.2 GAP GAP GAP

And you can find these overlapping similarities, you can fill in the gaps based on the information from the other portions. Of course, perhaps you could allow a leeway of 0.1 in your integers, so if the only thing holding you back from filling in a GAP is because one pattern says 4.2 and another says 4.3 (drum was hit in a different spot, but still relatively the same output), you could still count it as valid and perhaps average the two.

Of course, something like singing will yield less patterns than a bass line, but it could be somewhere to start.

But if you can fill this pattern for multiple channels, then recombine them into a single file...well, I have no idea what would happen, or how you would split them up in the first place.

Even Winamp comes with an Equalizer that controls the behaviour of different decibel levels in relation to another property (not sure what it is, but the units are K, and it may or may not be labeled preamp). But either way, there are a variety of things to reconstruct. The decibel levels, perhaps the frequency of the sounds if you can analyze that, etc.

That's my best idea for a reconstruction method.

curlious
01-06-2006, 07:44 AM
A MIDI file is just like a musical score isn't it.
So this is already done, but we all know synths and midis sound bad at there best.
I think Bach studied mathmatics in relationship to music though he was a composer. But the purpose of a musical score is artistic interpretation, not recorded regurgitation so some AI that could interpret a score decently and play instruments would freak me out. lol

CornedBee
01-06-2006, 11:47 AM
So this is already done, but we all know synths and midis sound bad at there best.
Considering how much of today's music is MIDI with a bit of directly recorded sound added to it, that's a pretty risky statement.
Although, considering the quality of much of today's music, perhaps not.