Scene / image recognition

**rogster001** · 07-20-2014

Hi all, I would appreciate any thoughts on a new project I wish to undertake. I am rather fascinated by the idea of image recognition software, I would like to write something that can pick categories of items from a scene. I was wondering about approach to this sort of thing. The idea of a template 'lookup' definition came to mind, with a library of basic objects as a starting point, then I could use heuristics etc in an evaluation routine. Id be happy if i just got the thing to output 'two cups' 'two saucers'. But i think even with a two tone sillhouette this would be tough, never mind the problems with a photograph, multitude of angles and image etc. Thanks for any thoughts on different approaches to this.

**Sebastiani** · 07-21-2014

Originally Posted by rogster001

Hi all, I would appreciate any thoughts on a new project I wish to undertake. I am rather fascinated by the idea of image recognition software, I would like to write something that can pick categories of items from a scene. I was wondering about approach to this sort of thing. The idea of a template 'lookup' definition came to mind, with a library of basic objects as a starting point, then I could use heuristics etc in an evaluation routine. Id be happy if i just got the thing to output 'two cups' 'two saucers'. But i think even with a two tone sillhouette this would be tough, never mind the problems with a photograph, multitude of angles and image etc. Thanks for any thoughts on different approaches to this.

This is really a very hard problem, obviously, but one particularly useful technique I've used in the past that turned out to be both fairly simple to implement and remarkably accurate (yet fairly "cheap" computationally-wise) made use of the humble wavelet to accomplish just that. There's a little more to it, of course, the color data was first converted to a meaningful color space and then normalized to a signed-number representation before convolution. Statistical calculations were then applied to each "level" of the multiresolutional result which was to be compared to that of another image to come up with an overall final "score". Well, that's the basic idea, anyhow.

[EDIT]
Re-reading your post now, I realize that you're asking for something a bit more elaborate than what I described (a means to identify each object in the scene). Finding multiple objects is just a matter of iteration over and grouping of the result - nothing too challenging, really...
[/EDIT]

**vart** · 07-21-2014

I would try to combine fingerprint approach used by Digikam for fuzzy search of similar images with edge processing - which could be used to split image into separate objects and then process fingerprint for each subarea separetly.

**rogster001** · 07-22-2014

nothing too challenging, really...

hehe, Its up there.. I was considering something more like Vart has suggested, certainly trying to just obtain data that can be broken down to a 'shape' parameter, definitely thinking edge detection and then my idea was to have a template library to compare that data against, I suppose that would be required anyway?

[EDIT] I see Sebastiani had already indicated use of a lookup

**rogster001** · 07-22-2014

Statistical calculations were then applied to each "level" of the multiresolutional result which was to be compared to that of another image to come up with an overall final "score"

Thanks, that would be something i'd like to explore. I get using different resolutions yes, would you start from the lowest resolution, that seems the natural one for speed maybe?

**Sebastiani** · 07-22-2014

Originally Posted by rogster001

Thanks, that would be something i'd like to explore. I get using different resolutions yes, would you start from the lowest resolution, that seems the natural one for speed maybe?

I would say all of them, preferably. I employed an equal-weight contribution at every scale, but I suppose you could get even better results by assigning a higher weight to increasing resolutions of the wavelet data.

Originally Posted by rogster001

I was considering something more like Vart has suggested, certainly trying to just obtain data that can be broken down to a 'shape' parameter, definitely thinking edge detection and then my idea was to have a template library to compare that data against, I suppose that would be required anyway?

Sure, you can do that, but don't expect spectacular results - maybe coupled with some other method to obtain a hybrid comparison of the data? I used nothing more than a simple Haar band-pass filter and still achieved amazing results (not to mention being fast and efficient).

Thread: Scene / image recognition

Thread Tools

Search Thread

Display

Scene / image recognition

Similar Threads

What happens behind the scene when value is returned ?

Scene Graph, Quadtree, Octtree, and Scene Manager.

Scene Description Language in C

Scene graphs

automatic image recognition