Thread: Which data structure would be best?

  1. #1
    Registered User
    Join Date
    Feb 2006
    Posts
    71

    Which data structure would be best?

    I have a list of 256 numbers that can range in value up to 256. Each of these numbers are associated to another value so I thought std;:map would be great until I realized there is no function for sorting by value rather than key which is what I need. I could dump the std::map into a new one which has the key/value swapped and it would sort it as I dump it in but that is far from efficient.

    I really wish I could just use <algorithm>'s sort on a 2d array. The more I think about this maybe it'd just be best to make a struct, overload an operator and make a comparison functor and pass that into <algorithm>'s sort.

    I want to know if there is any data structure that would be convenient for what I want to do? It just seems like a bit of a hassle to do this for sorting an array >.<;

  2. #2
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    256 is very small number

    why exactly you need to sort data? What operations you need to do?
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I don't understand why you think you would want to sort based on value rather than key. If there is something you want to sort by then you make that the key, and make whatever you thought was going to be the key into the value. If it is possible to just sort the data e.g. in an array, then it is possible to use a map.

    However, with only 256 possible values, and where each item is so small, you'll get the best performance out of just putting it into an array and sorting that.

    So how does a 2D array come into it? Your description seems to change as it goes along. What are you really trying to do?
    Why not flatten any 2D array into a 1D array, with bitmap style addressing?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User
    Join Date
    Feb 2006
    Posts
    71
    Write now I am writing code to do Huffman-Coding. It's more of a programming exercise for myself since I haven't done any in a long time. I understand with just 256 values (maximum) performance is negligible for the most part. I just want to write this to be as clean as possible. It's also important I don't link to any libs such as boost for bimaps etc.

    I can explain why I think I can't use std::map well. It might be easier to just show a bit of code:

    Code:
    std::map<char, int> frequency;
    std::pair<std::map<char,int>::iterator,bool> check;
    	
    	//get frequency of bytes 
    	for(int x = 0; x < size; x++){
    		check = frequency.insert(std::pair<char,int>(data[x],1)); 
    		if(check.second == false){
    			check.first->second += 1;
    		}
    	}
    
    //...
    std::map worked great at this part mainly because I don't want duplicates of reoccuring bytes, I just want to record the frequency that they occur and pair them with the byte value.

    this code does do that. the problem is that now that it's been done such that the key is the byte-value and the value is the frequency it's sorted the wrong way. I then need to have it sorted by value. I've done this just using an array, wrote up a quick merge sort and let it go.

  5. #5
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    You might be overthinking this. How about:
    Code:
        int values[256] = {0};
    
        //get frequency of bytes 
        for(int x = 0; x < size; x++)
            values[data[x]]++;
    And then just sort it. It's only 256 elements.

    EDIT:
    Actually, if you sort it you'll lose the character connection.
    So instead, sort a parallel index array:
    Code:
    int bytes[256];
    for (i=0; i<256; i++) bytes[i] = i;
    
    int comp(const void* a, const void* b) {
        return values[*(int*)a] - values[*(int*)b];
    }
    
    qsort(bytes, 256, sizeof(int), comp);
    Last edited by oogabooga; 07-16-2013 at 03:28 PM.

  6. #6
    Registered User
    Join Date
    Feb 2006
    Posts
    71
    Already did that. It works. The problem is after I sort values[] I will lose the information on which byte had that particular frequency. I could then make the array 2d where I store in values[data[x]][1] the byte value that frequency relates to. basically:

    Code:
    values[data[x]][1] = data[x];
    //sort keeping values[x][0] and values[x][1] paired.
    I was just thinking if there was a better structure to use... like an associative priority queue. maybe like a priority queue of <set>s lol. Actually that might be exactly what I want >.>;

  7. #7
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Reread my last post. I edited it.
    Here's an example of the idea (in C):
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>
    #include <ctype.h>
    
    #define SIZE 1000
    
    int values[256] = {0};
    int bytes[256];
    
    int comp(const void* a, const void* b) {
        return values[*(int*)b] - values[*(int*)a];
    }
    
    int main(void) {
      int i;
      srand(time(0));
    
      for (i=0; i<SIZE; i++) values[rand()%256]++;
      for (i=0; i<256; i++) bytes[i] = i;
      qsort(bytes, 256, sizeof(int), comp);
      
      for (i=0; i<256; i++) {
        int j = bytes[i];
        printf("%3d (%c) %3d\n", j, isprint(j)?j:'.', values[j]);
      }
    
      return 0;
    }

  8. #8
    Registered User
    Join Date
    Feb 2006
    Posts
    71
    Thanks for the help =). I really was over thinking it.

  9. #9
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    You can either use two integer arrays as shown above, or an array of structures, with each structure containing two integers.

  10. #10
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    oogabooga: What is stopping you from making a C++ example in the C++ section? Strive for it!
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  11. #11
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Quote Originally Posted by Elysia View Post
    oogabooga: What is stopping you from making a C++ example in the C++ section? Strive for it!
    Grumble, grumble...
    Code:
    /*
    Counts number of times each character appears in input file.
    Prints results sorted by count.
    Can also access counts by character.
    
    Sample output (on source file starting with first include):
    >g++ -Wall -Wextra -o freq.exe freq.cpp
    >freq freq.cpp
    20       561    0.2134
    74  t    195    0.0742
    65  e    137    0.0521
    72  r    128    0.0487
    6e  n    113    0.0430
    73  s    105    0.0399
    6f  o     99    0.0377
    63  c     93    0.0354
    69  i     88    0.0335
    0a  .     88    0.0335
    61  a     87    0.0331
    3a  :     68    0.0259
    3c  <     66    0.0251
    6d  m     62    0.0236
    64  d     58    0.0221
    29  )     56    0.0213
    28  (     55    0.0209
    3b  ;     52    0.0198
    75  u     51    0.0194
    ...
    3f  ?      1    0.0004
    34  4      1    0.0004
    Total characters: 2629
    Different characters: 69
    Number of semicolons: 52
    */
    #include <iostream>
    #include <iomanip>
    #include <fstream>
    #include <vector>
    #include <algorithm>
     
    typedef unsigned char uchar;
    
    struct Letter {
        uchar m_letter;  // unsigned to avoid sign-extension
                         // when using/printing as a number
        int   m_count;
    
        void set(uchar c) { m_letter = c; m_count = 0; }
        void inc() { m_count++; }
        std::ostream& prn(std::ostream& os, int denom=0);
    };
    
    std::ostream& Letter::prn(std::ostream& os, int denom) {
        char c = isprint(m_letter) ? m_letter : '.';
        os << std::hex << std::setw(2) << std::setfill('0')
           << static_cast<int>(m_letter)
           << "  " << c << "  "
           << std::dec << std::setw(5) << std::setfill(' ')
           << m_count;
        if (denom > 0)
            os << "    " << std::setprecision(4) << std::fixed
               << (double)m_count / denom;
        os << '\n';
        return os;
    }
    
    class FreqCntr {
        static const int NCHARS = 256;
        std::vector<Letter> m_letters;
        std::vector<int>    m_countOrder;
        int                 m_total;
    
    public:
        FreqCntr() : m_total(0) {
            m_letters.resize(NCHARS);
            m_countOrder.resize(NCHARS);
        }
        bool operator()(int a, int b) { // comparator for sort
            return m_letters.at(a).m_count > m_letters.at(b).m_count;
        }
        int getCount(uchar c) { return m_letters.at(c).m_count; }
        void count(const char* filename);
        void read(const char* filename);
        std::ostream& prnByCount(std::ostream& os);
    };
    
    void FreqCntr::read(const char* filename) {
        std::ifstream is(filename); //, std::ios::binary);
        for (char c = 0; is.get(c); m_total++)
            m_letters.at(static_cast<int>(static_cast<uchar>(c))).inc();
    }
    
    void FreqCntr::count(const char* filename) {
        for (int i = 0; i < NCHARS; i++) {
            m_letters.at(i).set(static_cast<uchar>(i));
            m_countOrder.at(i) = i;
        }
        read(filename);
        std::sort(m_countOrder.begin(), m_countOrder.end(), *this);
    }
    
    std::ostream& FreqCntr::prnByCount(std::ostream& os) {
        int i = 0;
        for ( ; i < FreqCntr::NCHARS; i++) {
            Letter &lt = m_letters.at(m_countOrder.at(i));
            if (lt.m_count == 0) break; // only print non-zero counts
            lt.prn(os, m_total);
        }
        os << "Total characters: " << m_total << '\n';
        os << "Different characters: " << i << '\n';
        return os;
    }
    
    int main(int argc, char** argv) {
        if (argc < 2) { std::cerr << "Needs a filename.\n"; return 1; }
        FreqCntr fc;
        fc.count(argv[1]);
    // Can access by count order:
        fc.prnByCount(std::cout);
    // Can access by character:
        std::cout << "Number of semicolons: " << fc.getCount(';') << '\n';
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 04-02-2013, 05:19 AM
  2. difference between data object and data structure?
    By c_lady in forum C++ Programming
    Replies: 2
    Last Post: 02-22-2011, 12:30 PM
  3. Data structure for storing serial port data in firmware
    By james457 in forum C Programming
    Replies: 4
    Last Post: 06-15-2009, 09:28 AM
  4. data structure design for data aggregation
    By George2 in forum C# Programming
    Replies: 0
    Last Post: 05-20-2008, 06:43 AM
  5. Data Structure in C++
    By Accident Prone in forum C++ Programming
    Replies: 2
    Last Post: 05-10-2002, 11:02 AM