Thread: Music enhancement through statistical pattern recognition and approximation

  1. #1
    Chad Johnson
    Join Date
    May 2004
    Posts
    154

    Question Music enhancement through statistical pattern recognition and approximation

    Suppose you were to record a song and then convert it to MP3 at say 128kbps. The quality would relatively poor. Well do you think it would be possible for a piece of software to up the quality by statistically filling in the gaps through pattern recognition and approximation (which is done through the use of a neural network)? How much do you think you could improve the quality through such a method?

  2. #2
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Absolutely! But it would require some serious research and would probably end up in a Ph.D. thesis. Ideally, the computer would recognize the different instruments and save the song in a MIDI-like format.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  3. #3
    Chad Johnson
    Join Date
    May 2004
    Posts
    154
    Why a MIDI-like format? Could the inference mechanism be good enough to save it as an MP3?

  4. #4
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    When he refers to a MIDI-like format, I believe, he's referring to your inferred goal of a perfect reproduction of what was originally recorded.

    MP3 is a lossy compression format, so I don't know why you think this makes a good target format when you're trying to fill in gaps in data. The ultimate goal as I see it would be to have a program recognize the different parts of the music and produce new data accordingly. MIDI is far too corse for this, but it's the same sort of idea - not storing just sound waves, but separating the sound into more useable information.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    so I don't know why you think this makes a good target format when you're trying to fill in gaps in data.
    I think you misunderstood him. He wants to make up for the compression loss by algorithmically recreating the sound.

    Which, given the nature of MP3 compression, I'm not sure is even logically possible.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Chad Johnson
    Join Date
    May 2004
    Posts
    154
    Yea, I don't know much about the MP3 format, but I guess I was thinking of just a general sound format. I picked MP3 because that's what all my music is in (legally). Maybe WAV or something else would work better.

    But yea, I was thinking the sound file would be loaded into memory, so you'd have the raw data to work with (after decoding it of course), you'd fill in the gaps algorithmically, and then you'd write a brand new file.

  7. #7
    Rad gcn_zelda's Avatar
    Join Date
    Mar 2003
    Posts
    942
    Why not just have the MP3 readers do that? That way the MP3 file sizes and compression would stay the same, and it would still sound better quality?

  8. #8
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    The problem with this is the irregularity of natural music. Let me try to illustrate what I mean.

    Pattern recognition usually aims to recognize patterns based on complete data, and then to either take action based on what was recognized, or continue the pattern. One of the simplest forms of pattern is the number sequence. I'm sure you'll have no problems recognizing and continuing these patterns (even Excel can do that):
    Code:
    1 2 3 4 5 6 7 8
    Code:
    1 3 5 7 9 11
    Code:
    1 2 4 8 16 32
    The nice thing about these sequences is that they are regular. That makes it even possible to fill in gaps:
    Code:
    1   3   5   7
    In fact, they are so simple that they can be described by a formula.
    Code:
    e(i) = e(i-1)+1
    Continuation is generally easier than filling in the gaps, though.
    Code:
    1 9 2 8 3 7 4 6 5 5 6 4
    This is simple to continue. Can you fill in the gaps if I remove every second element, though? Without actually knowing how the sequence works?
    Code:
    1   2   3   4   5   6
    Already the data loss has been significant, and what's worse, irrecoverable. But music isn't even that nice. Music is completely irregular.
    Code:
    1 5325 39 6436 35 6 235 2  32  6136013 35 92
    You couldn't continue that sequence even if you tried to. That's because it's just random numbers. The problem is, there is a certain amount of randomness in natural music, too. You don't pick a guitar twice exactly the same way. You don't hit the drum in exactly the same place twice. You don't hit the same tone with your voice twice.

    When you have two samples of audio, there's not really much you can do to find out what's between them. So you recognized it as part of a guitar sound? Great! If you have an exact sound profile of the guitar in question, being played the way it is in this song, with the exact effects settings, you can use this information to find the exact place within a sound sample that your two samples are at and interpolate between them using that information. Only, you also have to consider that base in the background, and the drums obscuring the sound, and of course that guy with the raspy voice singing some weird made-up lyrics. Oh, you have all the samples for those, too? Great, why not play that instead of the low-quality song?
    You get where I'm aiming? Natural music does not follow any patterns you can recognize, thus the only way to enhance low-quality data is to feed high-quality data into it. But if you can waste all that storage on high-quality data, you might as well get high-quality data in the first place.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  9. #9
    Chad Johnson
    Join Date
    May 2004
    Posts
    154
    Good points. I guess you could not take the average between two points right next to each other and fill it in that way either.

  10. #10
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Well, you could, but the enhancement would be minimal, if any.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #11
    It's full of stars adrianxw's Avatar
    Join Date
    Aug 2001
    Posts
    4,829
    >>> Well, you could, but the enhancement would be minimal, if any.

    It could be totally negative in effect. Consider...

    1 10 3 8 5 6 7 4

    ... chopped to...

    1 3 5 7

    ... and "filled" as suggested...

    1 2 3 4 5 6 7

    ... totally wrong.

    The placing of averaged points between existing points tend to flatten/soften the sound of music, it does not enhance it. It tends to sound like the listener is wearing thick woolen earmuffs.

    A way to recover lost data points may be to fourier out all of the sine curves comprising the signal, then insert extra data points into the various curves before re-assembling the signal. I've never tried that. Music has a lot of sudden notes and chords appearing/going, so, particulaly for low frequecy components, adding extra data points before or after they should appear will tend to "smear" the sound in and out.

    I can see that working well for a single note from an instrument however.
    Wave upon wave of demented avengers march cheerfully out of obscurity unto the dream.

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Which requires you to do MIDI analysis first, which tends to work well only with high-quality music, and of course reduces sound quality by itself.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Well do you think it would be possible for a piece of software to up the quality by statistically filling in the gaps through pattern recognition and approximation?
    Isn't this what compression algorithms already do?

  14. #14
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    No, not really. Formats like MP3 or Vorbis analyze the data and first cast out everything the human ear doesn't hear properly anyways. This greatly simplifies the data, making it easier to apply other compressions to it.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  15. #15
    It's full of stars adrianxw's Avatar
    Join Date
    Aug 2001
    Posts
    4,829
    >>> Which requires you to do MIDI analysis first,

    People were doing fourier decomposition of waveforms and music years before MIDI was invented. I was fiddling with it myself in the same year the MIDI standard was first proposed, let alone finalised.

    The transform I propose will, of course, work better on a better quality signal, and best on an analog signal.
    Wave upon wave of demented avengers march cheerfully out of obscurity unto the dream.

Popular pages Recent additions subscribe to a feed