Which data structure would be best?

**Homunculus** · 07-16-2013

I have a list of 256 numbers that can range in value up to 256. Each of these numbers are associated to another value so I thought std;:map would be great until I realized there is no function for sorting by value rather than key which is what I need. I could dump the std::map into a new one which has the key/value swapped and it would sort it as I dump it in but that is far from efficient.

I really wish I could just use <algorithm>'s sort on a 2d array. The more I think about this maybe it'd just be best to make a struct, overload an operator and make a comparison functor and pass that into <algorithm>'s sort.

I want to know if there is any data structure that would be convenient for what I want to do? It just seems like a bit of a hassle to do this for sorting an array >.<;

**vart** · 07-16-2013

256 is very small number

why exactly you need to sort data? What operations you need to do?

**iMalc** · 07-16-2013

I don't understand why you think you would want to sort based on value rather than key. If there is something you want to sort by then you make that the key, and make whatever you thought was going to be the key into the value. If it is possible to just sort the data e.g. in an array, then it is possible to use a map.

However, with only 256 possible values, and where each item is so small, you'll get the best performance out of just putting it into an array and sorting that.

So how does a 2D array come into it? Your description seems to change as it goes along. What are you really trying to do?
Why not flatten any 2D array into a 1D array, with bitmap style addressing?

**Homunculus** · 07-16-2013

Write now I am writing code to do Huffman-Coding. It's more of a programming exercise for myself since I haven't done any in a long time. I understand with just 256 values (maximum) performance is negligible for the most part. I just want to write this to be as clean as possible. It's also important I don't link to any libs such as boost for bimaps etc.

I can explain why I think I can't use std::map well. It might be easier to just show a bit of code:

Code:

std::map<char, int> frequency;
std::pair<std::map<char,int>::iterator,bool> check;
	
	//get frequency of bytes 
	for(int x = 0; x < size; x++){
		check = frequency.insert(std::pair<char,int>(data[x],1)); 
		if(check.second == false){
			check.first->second += 1;
		}
	}

//...

std::map worked great at this part mainly because I don't want duplicates of reoccuring bytes, I just want to record the frequency that they occur and pair them with the byte value.

this code does do that. the problem is that now that it's been done such that the key is the byte-value and the value is the frequency it's sorted the wrong way. I then need to have it sorted by value. I've done this just using an array, wrote up a quick merge sort and let it go.

**oogabooga** · 07-16-2013

You might be overthinking this. How about:

Code:

    int values[256] = {0};

    //get frequency of bytes 
    for(int x = 0; x < size; x++)
        values[data[x]]++;

And then just sort it. It's only 256 elements.

EDIT:
Actually, if you sort it you'll lose the character connection.
So instead, sort a parallel index array:

Code:

int bytes[256];
for (i=0; i<256; i++) bytes[i] = i;

int comp(const void* a, const void* b) {
    return values[*(int*)a] - values[*(int*)b];
}

qsort(bytes, 256, sizeof(int), comp);

**Homunculus** · 07-16-2013

Already did that. It works. The problem is after I sort values[] I will lose the information on which byte had that particular frequency. I could then make the array 2d where I store in values[data[x]][1] the byte value that frequency relates to. basically:

Code:

values[data[x]][1] = data[x];
//sort keeping values[x][0] and values[x][1] paired.

I was just thinking if there was a better structure to use... like an associative priority queue. maybe like a priority queue of <set>s lol. Actually that might be exactly what I want >.>;

**oogabooga** · 07-16-2013

Reread my last post. I edited it.
Here's an example of the idea (in C):

Code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <ctype.h>

#define SIZE 1000

int values[256] = {0};
int bytes[256];

int comp(const void* a, const void* b) {
    return values[*(int*)b] - values[*(int*)a];
}

int main(void) {
  int i;
  srand(time(0));

  for (i=0; i<SIZE; i++) values[rand()%256]++;
  for (i=0; i<256; i++) bytes[i] = i;
  qsort(bytes, 256, sizeof(int), comp);
  
  for (i=0; i<256; i++) {
    int j = bytes[i];
    printf("%3d (%c) %3d\n", j, isprint(j)?j:'.', values[j]);
  }

  return 0;
}

**Homunculus** · 07-16-2013

Thanks for the help =). I really was over thinking it.

**rcgldr** · 07-16-2013

You can either use two integer arrays as shown above, or an array of structures, with each structure containing two integers.

**Elysia** · 07-17-2013

oogabooga: What is stopping you from making a C++ example in the C++ section? Strive for it!

**oogabooga** · 07-17-2013

Originally Posted by Elysia

oogabooga: What is stopping you from making a C++ example in the C++ section? Strive for it!

Grumble, grumble...

Code:

/*
Counts number of times each character appears in input file.
Prints results sorted by count.
Can also access counts by character.

Sample output (on source file starting with first include):
>g++ -Wall -Wextra -o freq.exe freq.cpp
>freq freq.cpp
20       561    0.2134
74  t    195    0.0742
65  e    137    0.0521
72  r    128    0.0487
6e  n    113    0.0430
73  s    105    0.0399
6f  o     99    0.0377
63  c     93    0.0354
69  i     88    0.0335
0a  .     88    0.0335
61  a     87    0.0331
3a  :     68    0.0259
3c  <     66    0.0251
6d  m     62    0.0236
64  d     58    0.0221
29  )     56    0.0213
28  (     55    0.0209
3b  ;     52    0.0198
75  u     51    0.0194
...
3f  ?      1    0.0004
34  4      1    0.0004
Total characters: 2629
Different characters: 69
Number of semicolons: 52
*/
#include <iostream>
#include <iomanip>
#include <fstream>
#include <vector>
#include <algorithm>
 
typedef unsigned char uchar;

struct Letter {
    uchar m_letter;  // unsigned to avoid sign-extension
                     // when using/printing as a number
    int   m_count;

    void set(uchar c) { m_letter = c; m_count = 0; }
    void inc() { m_count++; }
    std::ostream& prn(std::ostream& os, int denom=0);
};

std::ostream& Letter::prn(std::ostream& os, int denom) {
    char c = isprint(m_letter) ? m_letter : '.';
    os << std::hex << std::setw(2) << std::setfill('0')
       << static_cast<int>(m_letter)
       << "  " << c << "  "
       << std::dec << std::setw(5) << std::setfill(' ')
       << m_count;
    if (denom > 0)
        os << "    " << std::setprecision(4) << std::fixed
           << (double)m_count / denom;
    os << '\n';
    return os;
}

class FreqCntr {
    static const int NCHARS = 256;
    std::vector<Letter> m_letters;
    std::vector<int>    m_countOrder;
    int                 m_total;

public:
    FreqCntr() : m_total(0) {
        m_letters.resize(NCHARS);
        m_countOrder.resize(NCHARS);
    }
    bool operator()(int a, int b) { // comparator for sort
        return m_letters.at(a).m_count > m_letters.at(b).m_count;
    }
    int getCount(uchar c) { return m_letters.at(c).m_count; }
    void count(const char* filename);
    void read(const char* filename);
    std::ostream& prnByCount(std::ostream& os);
};

void FreqCntr::read(const char* filename) {
    std::ifstream is(filename); //, std::ios::binary);
    for (char c = 0; is.get(c); m_total++)
        m_letters.at(static_cast<int>(static_cast<uchar>(c))).inc();
}

void FreqCntr::count(const char* filename) {
    for (int i = 0; i < NCHARS; i++) {
        m_letters.at(i).set(static_cast<uchar>(i));
        m_countOrder.at(i) = i;
    }
    read(filename);
    std::sort(m_countOrder.begin(), m_countOrder.end(), *this);
}

std::ostream& FreqCntr::prnByCount(std::ostream& os) {
    int i = 0;
    for ( ; i < FreqCntr::NCHARS; i++) {
        Letter &lt = m_letters.at(m_countOrder.at(i));
        if (lt.m_count == 0) break; // only print non-zero counts
        lt.prn(os, m_total);
    }
    os << "Total characters: " << m_total << '\n';
    os << "Different characters: " << i << '\n';
    return os;
}

int main(int argc, char** argv) {
    if (argc < 2) { std::cerr << "Needs a filename.\n"; return 1; }
    FreqCntr fc;
    fc.count(argv[1]);
// Can access by count order:
    fc.prnByCount(std::cout);
// Can access by character:
    std::cout << "Number of semicolons: " << fc.getCount(';') << '\n';
}

Thread: Which data structure would be best?

Thread Tools

Search Thread

Display

Which data structure would be best?

Similar Threads

which the best way to store duplicate data in tree (maybe tree data structure)?

difference between data object and data structure?

Data structure for storing serial port data in firmware

data structure design for data aggregation

Data Structure in C++