Thread: Music enhancement through statistical pattern recognition and approximation

  1. #16
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    But does Fourier decomposition give you individual instruments? I thought it just gives you individual sines.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  2. #17
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Quote Originally Posted by adrianxw
    The transform I propose will, of course, work better on a better quality signal, and best on an analog signal.
    Yes, you should start with a better quality signal, not with a MP3 stream.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  3. #18
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    The next question is whether video streams can be compressed with some kind of pattern recognition. There are half-working algorithms able to reconstruct a 3D-scene from a photograph. For each scene in a movie a 3D-scene would be reconstructed (much easier with multiple photographs, especially if the camera is moving) and the textures would be extracted and only the movement of the vertices saved. The compression ratio would be very high.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  4. #19
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Light sources. If you don't want to save one texture per frame per object (something I imagine might take more space than the video itself), you have to analyze the lighting of the scene, so you can correctly recreate the old look.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #20
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Quote Originally Posted by CornedBee
    Light sources. If you don't want to save one texture per frame per object (something I imagine might take more space than the video itself), you have to analyze the lighting of the scene, so you can correctly recreate the old look.
    Yes, and this will probably be very hard when both the objects and the light sources are moving. In extreme cases there could be a simpler algorithm to fall back to (eg. MPEG).
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  6. #21
    It's full of stars adrianxw's Avatar
    Join Date
    Aug 2001
    Posts
    4,829
    >>> But does Fourier decomposition give you individual instruments?

    No, but then, you don't need the individual instruments. If you extract a bunch of ratty sines, correct the sines, and put them back together, you are re-assembling the signal as a whole.

    If you consider a "piece of music" where all of the "instruments" were sine wave generators, then you will be extracting the "instruments", correcting them, and reassembling the piece.

    If however, one of the instruments is a square wave generator, you will not be extracting the square wave, but the series of sines that compose the square wave. Correcting those sines will still correct the square wave, but you will never have had the square wave isolated as an "instrument".

    The problem, as always, in dealing with a digitised signal is that the waveform you see is "stepped". This, in fourier analysis terms adds a vast number of higher frequency components to the signal. These will need some very intelligent filtering, but could be done as long as the sample frequency of the original piece is known.
    Wave upon wave of demented avengers march cheerfully out of obscurity unto the dream.

  7. #22
    Registered User
    Join Date
    Jul 2003
    Posts
    450
    How about this idea.
    lets say you have a musical piece sampled at 128Kb/s
    then you have the same piece sampled at 512Kb/s

    The 512Kb/s file is the master file on a server while the 128Kb/s file is on the client machine.

    The idea is to stream the missing data to the client to enhance the sound to the same quality as the 512Kb/s recording but not incure the space penalties on the client machine or necessarily have the original track even on the client.

    The data would be synchronized in assembled real time (or close as you could get).

  8. #23
    Chad Johnson
    Join Date
    May 2004
    Posts
    154
    Why not just load the whole thing in memory off the server and not have the client copy? Wouldn't this put about the same strain on both machines?

  9. #24
    Registered User
    Join Date
    Jul 2003
    Posts
    450
    Digital rights suppose the artist doesn't want his master tracks copied.

  10. #25
    Chad Johnson
    Join Date
    May 2004
    Posts
    154
    If you're streaming from a central server, and the people hosting the server don't want you copying their music (which I think is what you mean), there would be ways around that.

    You could just have a cable go from your soudn card's audio out to the audio in and record the song that way. You could also get a program that can write directly from the sound card to a sound file on your harddrive. And quality loss would not be much of an issue there if they're streaming at that high.

    What would be the difference between the master tracks and a copy of the master tracks, besides a little quality loss?

  11. #26
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    If you streamed the master as deltas, they would be kind of useless on their own ... hmm ...
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  12. #27
    Registered User
    Join Date
    Jul 2003
    Posts
    450
    What are deltas in relationship to streamed music?
    What are you thinking about?

    Ultimately the master track should be an Analog recording. I was just thinking a very high bitrate master file that could be used to fill in gaps on the clients copy so they could get a true studio quality sound.

    As far as the digital rights, some people will not even allow their music to be recorded so what do you do then?

  13. #28
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    If those gap fillers submitted are only offsets to the last value available on the client, the data amount is the same (or perhaps even less), and someone without the client file couldn't use the server file. You could say the master file is encrypted using the client file as the key.

    Of course, that would require a new media format. MP3 can't do that.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  14. #29
    Registered User
    Join Date
    Jul 2003
    Posts
    450
    I was thinking on similar lines but did not want to get off topic with encryption. Besides I have some moral quanandrums with putting the idea out their in relationship to music.

    The original post is more about artifical intelligence and musical analysis. So I am just going to drop it and go talk to my chat bot Desti.

    Have a good day people psi a nar a

  15. #30
    Registered User
    Join Date
    Jun 2003
    Posts
    361
    How about something along the lines of...

    All music has repetition. Some less than others, some more. Be it a bass line, drum beat, etc. So some sort of midi form where you can tell the difference between instruments/channels would be great first step to simplicity.

    Then, you look for the patterns. Consider the whole sequence for one channel:
    0.0 8.1 4.2 6.9 9.5 7.8...
    Not much of a pattern.

    But if at one point during the song, you find the pattern:
    0.0 GAP GAP 6.9 GAP 7.8
    And at another point"
    GAP 8.1 GAP GAP 9.5 7.8
    And one more:
    0.0 8.1 4.2 GAP GAP GAP

    And you can find these overlapping similarities, you can fill in the gaps based on the information from the other portions. Of course, perhaps you could allow a leeway of 0.1 in your integers, so if the only thing holding you back from filling in a GAP is because one pattern says 4.2 and another says 4.3 (drum was hit in a different spot, but still relatively the same output), you could still count it as valid and perhaps average the two.

    Of course, something like singing will yield less patterns than a bass line, but it could be somewhere to start.

    But if you can fill this pattern for multiple channels, then recombine them into a single file...well, I have no idea what would happen, or how you would split them up in the first place.

    Even Winamp comes with an Equalizer that controls the behaviour of different decibel levels in relation to another property (not sure what it is, but the units are K, and it may or may not be labeled preamp). But either way, there are a variety of things to reconstruct. The decibel levels, perhaps the frequency of the sounds if you can analyze that, etc.

    That's my best idea for a reconstruction method.
    Last edited by Epo; 01-05-2006 at 11:24 PM.
    Pentium 4 - 2.0GHz, 512MB RAM
    NVIDIA GeForce4 MX 440
    WinXP
    Visual Studio .Net 2003
    DX9 October 2004 Update (R.I.P. VC++ 6.0 Compatability)

Popular pages Recent additions subscribe to a feed