Hash_map, STL maps query

**alvifarooq** · 06-03-2005

1) what's the performance benefit of using hash_maps over normal maps?

2) can i set multiple keys in either?

3) if i have a struct (with element: int x, y

as an entry in the maps and I want to use this as a key, the sort order will be according to what? OR do I have to use let's say x as the other pair and set that as the key?

4) Multiple entries (same keys) in a map are overwritten. is it possible to count how many times this overwrite occurs and for which keys, without using other variables (except for storing answers, not for comparisions)?

Thanks!

**Daved** · 06-03-2005

A hash map with a good hash function and appropriate size provides constant time lookups and insertions. The regular STL map is generally implemented as a tree that provides O(log n) lookups and insertions.

The hash_map depends on your implementation, since it is not standard, but most likely neither a hash_map nor a regular map will allow multiple keys. You might consider storing things in a vector and sorting them based on the appropriate key or storing things in two different maps.

If you are asking if you can have multiple entries with the same key in a map, the answer is no. But there is an STL container called multimap that provides this functionality. The syntax is similar but not the same as a regular map. I would assume that implementations of hash_map might also provide for multiple entries with the same key, possibly following the STL technique by providing a multihash_map or similar.

If you have a struct as the key, you can (and should) define the comparison to be used to sort the map. This is likely a less than comparison for the STL map and possibly an equals comparison for a hash map. For an x, y, your comparison could check x first, and only if it is equal check y.

If you use a map, then the return value of insert tells you whether the value was inserted or not. You can use that to keep track of whether you are overwriting data or not. Or you could just use a multimap and all the entries will be kept.

**alvifarooq** · 06-04-2005

thanks Daved. i guess two maps will have to do for the comparisions. It would be easier if I had a container that stored multiple number of pairs instead of just one pair. That way I could choose one element of a pair to make a key and when I wanted to I could have chosen another. Something like:

Code:

struct ; int x  ;  int z
//note there are 3 elements in the container instead of a pair
e.g.
(x=1,y=1)  ;  x = 1  ;  z = 0 

So, I could make x a key in once instance and z the key in another.

any containers in C++ capable of that? or if not the multiple key thing, atleast capable of storing multiple elements in one entry?

**Prelude** · 06-04-2005

>any containers in C++ capable of that?
Sure, to a certain sanity point. It takes a little more work, but you can get a rough multi-keyed map using std::pair:

Code:

#include <iostream>
#include <map>
#include <utility>

namespace jsw {
  struct compare {
    bool operator()(const std::pair<int, int>& a, const std::pair<int, int>& b) const
    {
      if (a.first == b.first)
        return a.first < b.first;
      else
        return a.second < b.second;
    }
  };

  typedef std::map<std::pair<int, int>, std::pair<int, int>, compare> mymap_t;

  std::pair<int, int> key(int k)
  {
    return std::make_pair(k, k);
  }

  std::pair<std::pair<int, int>, std::pair<int, int> > value(int k1, int k2, int v1, int v2)
  {
    return std::make_pair(std::make_pair(k1, k2), std::make_pair(v1, v2));
  }
}

int main()
{
  jsw::mymap_t m;

  m.insert(jsw::value(1, 0, 1, 1));
  m.insert(jsw::value(3, 5, 2, 6));

  jsw::mymap_t::iterator it = m.find(jsw::key(5));

  if (it != m.end())
    std::cout<<"Found: ("<< it->second.first <<", "<< it->second.second <<")\n";

  it = m.find(jsw::key(1));

  if (it != m.end())
    std::cout<<"Found: ("<< it->second.first <<", "<< it->second.second <<")\n";

  it = m.find(jsw::key(0));

  if (it != m.end())
    std::cout<<"Found: ("<< it->second.first <<", "<< it->second.second <<")\n";

  it = m.find(jsw::key(3));

  if (it != m.end())
    std::cout<<"Found: ("<< it->second.first <<", "<< it->second.second <<")\n";
}

Of course, tailoring it to your needs is up to you, but the standard containers are very powerful if you're willing to work for it.

**alvifarooq** · 06-04-2005

thanks Prelude...do you think this would be 'substantially' better than using two maps...each with a different key value? would be straightforward enough for but the performance hit would be bad because just to get multiple keys i would have massive duplication of data! and then data stored in the map is also massive.

**Prelude** · 06-04-2005

>do you think this would be 'substantially' better than using two
>maps...each with a different key value?
It's a trade-off. You can minimize your memory usage by forcing everything into a single map and avoiding the overhead of another object, but you also make the program significantly more complicated in the process. Whether it's substantially better is your decision, which should only be made after extensive testing.

**SilentStrike** · 06-04-2005

So how is that code working? I guess the secret sauce is in the comparision operator, but it just looks weird to me.

Code:

bool operator()(const std::pair<int, int>& a, const std::pair<int, int>& b) const
    {
      if (a.first == b.first)
        return a.first < b.first;
      else
        return a.second < b.second;
    }

Is that equivalent to

Code:

bool operator()(const std::pair<int, int>& a, const std::pair<int, int>& b) const
    {
      if (a.first == b.first) return false;
      return a.second < b.second;
    }

**Prelude** · 06-05-2005

>So how is that code working?
Luck, just like everything else I write.

>Is that equivalent to
Not really, even though it works with the way I've set up the key structure.

**Daved** · 06-05-2005

Another option that may or may not be a good fit for you is to hold the data in a single vector. Write two comparison functions, one for each potential key. Add all your data into your vector. When you need to lookup by the first key, sort the vector with the comparison function based on that key, then use an STL algorithm like binary_search to find the entry. If you need to lookup something based on the second key, sort the vector with the comparison function based on that key and then use binary_search.

This has the advantage of saving space - there is no duplicated data. However, it is very inefficient if you need to switch back and forth between the keys that you use. It is probably only appropriate if you gather all your data at once first, then do lookups based on one key type a bunch of times, then do lookups on the other key type a bunch of times, so that you aren't constantly sorting your vector.

**alvifarooq** · 06-06-2005

this is what i'm planning to do now (since potentially the dataset could be huge)...please comment!

Code:

\\MULTIMAP called DATA made of struct {int x, int y} and int z (key, duplicates allowed)
\\MAP called Key1 made of something and int z (no duplicates here)
\\MULTIMAP called Key2 made of struct {int x, int y} and int x (duplicates allowed)
\\I enter around 10000 values in DATA.

for all Key1 entries //i.e. all z values
{

   for all DATA.find (Key1)
   {
      //push DATA.struct and DATA.struct.x on Key2 as pairs
      //Key2 is ordered by x now
      //Output to screen
   }

}

Only pseudocode (so don't mind the programming errors!). If you notice int z and int x are the different keys. You think this is a good way?

MORE EXPLANATION FOR ORIGINAL PROBLEM (if you're interested)
*************************
- Consider a straight line through the origin with gradient = 1. Consider 6 points on this line at the coordinates: (1,1), (2,2), (3,3), (6,6), (7,7), (8,8). All coordinates have the same y-intercept at (0,0).

- Let's say I want to see which coordinates are connected to each other. The criteria is that they should be equally spaced apart. On the basis of this you get two line segments on the y = x line which goes through all these points.

Segment 1: (1,1), (2,2), and (3,3)
Segment 2: (6,6), (7,7), and (8,8)

My original MULTIMAP (called DATA) stores these coordinates and has the y-intercept (z) value as key (I have several of such points with different z. I want to cluster them together based on z).

The second MULTIMAP (called Key2) stores coordinates with the same z, and sets the x-cordinate as the key.

Do you see my concern? I need to have these keys, first to filter out the same z's and then to arrange them in ascending order of x-coordinate to print. Please help!

**CornedBee** · 06-06-2005

Look at this:
http://www.boost.org/libs/multi_index/doc/index.html

**alvifarooq** · 06-06-2005

thanks CornedBee! looks pretty much exactly what I was looking for. One last question and I don't want to start a new thread on it because I found some info. using Search anyway...

What I'm developing is a bioinformatics application and for while I want to write it out in C++, there are some features from applications like MATLAB etc which I would like to integrate in my application (like using the MATLAB plot features etc). I'd like to know if wrappers are the best way to go in such types of app. development. Performance is the single most important reason I'm looking towards C++ (I have experience in both). You think I'm better off not integrating the MATLAB compiler and using Win32 API for the plot?

**Zach L.** · 06-06-2005

Is there any reason to build the plot viewer directly in? Could using an external tool work? Perhaps MATLAB, or Gnuplot, or the like. i.e. simply output the results you want to an appropriately formatted file, and call the external tool. Perhaps a using a script as a wrapper to your program is worthwhile, even (call your program, get your output, format it, save it, call the external tool, all with the simplicity of something like Perl).

It just seems that if you don't need the plotting capabilities built in, then you'd save a lot of work this way.

Cheers

**alvifarooq** · 06-07-2005

i kind of wanted the entire thing to be a standalone application...so i want a very simple GUI in C++ where the user enters the sequences, makes selections etc...(this I can do with Borland C++), and then to make the plot, instead of generating something like that from C++, I simply output the results to MATLAB...BUT I don't want the user to necessarily have MATLAB installed on their system...wouldn't that mean that I include the MATLAB compiler?

**CornedBee** · 06-07-2005

Are you even allowed to do that?

Thread: Hash_map, STL maps query

Thread Tools

Search Thread

Display

Hash_map, STL maps query

Similar Threads

Problems with strings as key in STL maps

For those knowing STL maps...

Prime Number Generator... Help !?!!

include question

STL - Maps