Thread: How To Measure "Difference" Between Images?

  1. #1
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273

    Question How To Measure "Difference" Between Images?

    Hello,

    So here I am, stuck overnight at work and whiling away the wee hours on another crazy coding spree.

    Anyways, I am attempting to reason with the following:-
    I have 32 8-bit greyscale images which may/may not be similar;
    I want to have 16 images that best represent the original 32.

    My strategy:-
    Select the 16 most divergent images (the ones that are the least like the others);
    Cluster the other 16 around these, combining the clustered images together.

    But what's a good metric to determine how "like" two two-dimensional arrays are?

    At the moment I'm doing abs(img1[n] - img2[n]) yes, I'm hopeless. This can't deal with for example, comparing images that are complements (inversions) of each other, they would be considered identical. I need something that takes into account the position of each pixel within the image, I think.

    I have heard about chi square distribution being used to compare frames of video, but I'm not a math geek and would appreciate someone putting it through the blender for me, if that is what I need.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    Well I guess that'll keep me busy. Thanks

  4. #4
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Which technique to use depends on the problem requirements. Do you need to identify images which are scaled/rotated copies of each other? Is your inverted-color example just an example or an actual requirement? Etc.

    abs(img1[n] - img2[n]) is called the absolute difference method and is the technique used in most video compression algorithms, so I wouldn't call you "hopeless" for adopting such a technique. It's just that it's not robust to certain types of transformations of the image. If you are more specific I can give some better ideas.

    EDIT: You can fairly easily change your absolute value technique to a correlation technique. First, compute the mean and variance of the pixel values in the image, and normalize them such that the mean is zero and the variance is one. After doing this to both images, compute sum(img1[n] * img2[n]) over all pixels in the image. The result is the normalized cross-correlation at zero shift. If this value is close to zero, the images are not correlated. If it is positive, the images are positively correlated. If it is negative, the images are negatively correlated (inverted colors)

    Again, whether this works better than taking absolute differences depends on what you are trying to achieve.
    Last edited by brewbuck; 08-06-2013 at 02:19 PM.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  5. #5
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    Well, I sorta... am trying to reduce an image to up to 256 8x16 tiles... to load into character memory in 80x25 text mode.
    Just so you know, I did wet myself a little writing that.

    This is better than various ASCII/ANSI art renderers as they use the standard character set and typically need thousands of characters to render hundreds of pixels. My implementation would be 1:1.
    The only downside is depending on the image, fidelity is gonna get pretty bad. So having the right algorithm to "fold" different tiles together is crucial.
    Identifying rotations probably wouldn't help (I wouldn't want to lump symmetrical tiles together), slight scales/translations might be useful though, to preserve detail.

  6. #6
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    So you are looking for a set of 256 8x16 bitmaps which in some sense "optimally" represents the images you want to draw? Then, given a particular chunk of image, to identify which of those tiles matches that image chunk best? Like, old-school custom font graphics?

    Assuming you say yes, the sum of absolute differences is quite a good method for determining which pre-made tile is the best match for a given image part. Determining the optimal set of tiles is more of an open-ended question, but I would take a large sample of the images you will be using (hopefully all of them), divide them into 8x16 chunks, then cluster them into 256 different clusters.

    I can propose clustering methods if you confirm that this is indeed what you are doing.
    Last edited by brewbuck; 08-06-2013 at 08:58 PM.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  7. #7
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    Almost got it, except mapping part of the image to a tile will modify the tile to include features from that part. Everywhere else that that tile is used will take on that change.
    As I said, likely to get noisy but it's the best way that I can think of doing this. That and keeping the images small and greyscale.
    I'm developing the algorithm around a 256x192 test image, 384 tiles of input -> 256 tiles of output.
    I am also mulling reserving one tile for a border and areas of the image that are or close to solid colour. Mapping parts to this tile does not modify the tile.
    Getting there. 8)

  8. #8
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by brewbuck View Post
    I would take a large sample of the images you will be using (hopefully all of them), divide them into 8x16 chunks, then cluster them into 256 different clusters.
    Me too, pretty much. Do you know any clever ways to include psychovisual modeling?

    (I don't. I do know the brute-force way of first detecting any human-noticeable details (edges, curves, intersections, corners, and dots), using both the list of details and the pixel difference statistics to compare chunks.)

    To those who are unfamiliar with the term "psychovisual modeling":

    To a human, \ and ╲ are very similar, as are x and × and X and ╳. However, | and ( are quite dissimilar. Applying the way humans perceive visual differences to quantifying differences or similarity, is applying psychovisual modeling.

    Psychoacoustic modeling is much more common. For example, when audio data is digitally compressed, the noise generated is often shaped so that the frequency spectrum matches the sensitivity of human hearing; this minimizes the perception of that noise.

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Nominal Animal
    To a human, \ and ╲ are very similar, as are x and × and X and ╳. However, | and ( are quite dissimilar. Applying the way humans perceive visual differences to quantifying differences or similarity, is applying psychovisual modeling.
    Hmm... I recall reading some article about those optical illusions where say, two lines of the same length are presented in a slightly different context (due to surrounding lines) and hence perceived as being of different lengths, mentioning that in certain non-urban cultures, people were not susceptible to the illusion. The theory highlighted by that article is that these people spend most of their time in environments without such straight lines, thus their perception is different. If so, wouldn't this mean that "to a human" might be an unwarranted generalisation in your statement, or is psychovisual modeling able to account for such cultural/environmental factors?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by laserlight View Post
    wouldn't this mean that "to a human" might be an unwarranted generalisation in your statement
    Yes and no. Yes, there are cultural and even individual differences as to exactly how humans perceive visual details. No, because the features I mentioned are general enough; they are perceived the same way by practically all humans.

    In fact, I do believe most of the research into this kind of visual metrics is actually done in relation to machine vision. I personally suspect that the importance of these features (straight lines, intersections, curves, spots) to perception is roughly common across most non-nocturnal mammals with good stereoscopic vision, not just humans.

    Quote Originally Posted by laserlight View Post
    is psychovisual modeling able to account for such cultural/environmental factors?
    In general, no. The same applies to psychoacoustic modeling, too.

    For example, the frequency-specific sensitivity of hearing is very dependent on age, not just cultural/environmental factors. As humans age, the high-frequency end of the spectrum is lost, mostly due to (normal) damage to the hair cells in the organ of Corti. Headphone use, listening to loud music, working in a noisy industrial environment all affect this.

    Mostly, the models are quite, quite rough. We're nowhere near the level of detail where optical illusions work.

    In my opinion, if optical illusions are the patterns in tree bark, in psychovisual modeling we are roughly able to tell that there is a forest there, but not even tell the kinds of trees in it. Fortunately, even that kind of rough models do yield useful results, compared to e.g. plain pixel statistics.



    A recent Slashdot article mentions how D. Kriesel, a German researcher, has found that in some specific situations, some scanners/photocopiers alter numbers when scanning documents.

    It seems that this is a side effect from the JBIG compression used internally by the scanners and photocopiers; it basically uses similar chunks as the OP in this threads, and pixel-based statistics to compare those chunks.

    Unfortunately, there is just a small difference between a 6 and a 8, for example. When the chunk happens to sit just right on a single digit, that chunk is used as-is elsewhere. That seems to be the reason for the 6 ↔8 changes in the examples, at least.

    This is an excellent example as to why psychovisual modeling should have been taken into account: the pixel statistics alone are just not enough to give a satisfactory result.

    I do believe counting the number of "intersections" and "endpoints" (and perhaps discontinuous changes in curvature) in each chunk would be sufficient, and not too difficult or slow to implement. (The dataset a scanner/photocopier works on is rather big, and they don't have that powerful image manipulation capabilities to start with.)

  11. #11
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Nominal Animal View Post
    Me too, pretty much. Do you know any clever ways to include psychovisual modeling?
    The most well-understood areas of human vision from an image processing perspective are the frequency/contrast luminance response curves as well as chromatic sensitivity. I am not aware specifically of any research into the eye's sensitivity to small-scale variation in textures. Simple measures like SAD, Hamming distance (which is same as SAD for bitonal images), or correlation are probably the best you're going to get unless someone can find the relevant research. I don't know it.

    I can pose this question to one of the human vision scientists here.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  12. #12
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by brewbuck View Post
    I am not aware specifically of any research into the eye's sensitivity to small-scale variation in textures.
    I was referring more to the image processing in the human brain.

    I've seen some related research (I'm just interested in it, it's not at all my field) in machine vision, stuff like edge detection and estimating distances in single images (i.e. non-stereoscopic) using specific identified details like those I mentioned.

    While technically the researchers are loath to say it's psychovisual modeling, it really is what they're doing: they find ways to program (or in some cases teach, if using neural nets or similar) the machine to perceive or extrapolate similar information humans (the researchers themselves) derive from the same images.

    Quote Originally Posted by brewbuck View Post
    I can pose this question to one of the human vision scientists here.
    I suspect machine vision or image processing people might know more interesting techniques.



    D'oh! I'm stupid, or getting old, or both.

    I just now remembered where I first encountered this stuff: in optical character recognition, specifically feature detection. The Wikipedia article describes pretty well what I have been trying to describe here.

  13. #13
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    While you intellectuals are busy sizing each other up, I thought you might be interested in seeing the test image that I'm using:-
    https://ece.uwaterloo.ca/~z70wang/re...s/image024.gif
    (Credit: Zhou Wang, University of Waterloo)

    I'm banking on the similarity in the roofing to quantize into tiles better than other kinds of detail.

  14. #14
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by SMurf View Post
    I'm banking on the similarity in the roofing to quantize into tiles better than other kinds of detail.
    Photographs are usually easier to compress than other kinds of images, and don't need the kind of handling I was talking about.

    After all, the issues with the scanners and photocopiers using a very similar method (chunking and reusing those chunks if they approximately match) causing severe problems with numeric data seem to have completely surprised the manufacturers.

    What I am saying is that when your implementation works fine with photographs, do not assume it will work fine for all other use cases too.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 08-19-2012, 06:15 AM
  2. difference between strcmp and "=="
    By yuliang11 in forum C++ Programming
    Replies: 6
    Last Post: 11-08-2006, 12:23 AM
  3. Replies: 6
    Last Post: 05-18-2003, 06:29 PM
  4. "itoa"-"_itoa" , "inp"-"_inp", Why some functions have "
    By L.O.K. in forum Windows Programming
    Replies: 5
    Last Post: 12-08-2002, 08:25 AM
  5. "CWnd"-"HWnd","CBitmap"-"HBitmap"...., What is mean by "
    By L.O.K. in forum Windows Programming
    Replies: 2
    Last Post: 12-04-2002, 07:59 AM