Thread: What is the best way to imlement link list of numbers?

  1. #1
    Registered User
    Join Date
    Sep 2010
    Posts
    12

    What is the best way to imlement link list of numbers?

    I am new to C++ so please go easy on the technical jargon. Have a question about linked list - or may be there is a better way to do this.
    Here is the idea and I have no idea on how to implement it. Suppose we have a large file with numbers that are entered by users which are mostly repetitive (the numbers, not the users!) Something like this:
    12345612345612345678
    12345612345612345679
    12345612345612345680
    12345612345612345681
    etc.
    Suppose this file is in gigabytes. What I want to do is to take each entered number, decompose it to individual digits, link them somehow, create pointers in memory to their position within the line and then write only the different (new) combination of numbers and their positions.
    I realize it would be easier to write the numbers as they are if this was a small amount of data. But what I am trying to do is to speed up writing and reading speed and decrease the space usage.
    Questions:
    1) What is the best way to do this? Linked lists and memory pointers? Examples?
    2) Will space and storage difference be positive?
    Thanks a lot!

  2. #2
    Bored Programmer
    Join Date
    Jul 2009
    Location
    Tomball, TX
    Posts
    428
    Not sure exactly what type of file you are reading, but this might help

    tellp - C++ Reference

    if you look to the left on the page it has difinitions of other pointer placement and pointer finding techniques in bianary and text files.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by grigorianvlad
    What I want to do is to take each entered number, decompose it to individual digits, link them somehow, create pointers in memory to their position within the line and then write only the different (new) combination of numbers and their positions.
    It sounds like you want a trie, or some variant thereof. There is no such container in the standard library, so you have to implement it yourself.

    I do not think that you need to make use of Lesshardtofind's suggestion. From your example input, you can simply read line by line into a string, then traverse the characters of the string while traversing the trie. When you reach a node in the trie for which there is no child corresponding to the next character in the string, you add the new child node, and move on until the end of the string. Of course, since you will be using manual memory management, take care to implement the destructor correctly, and either implement the copy constructor and copy assignment operator correctly, or disable them.

    Quote Originally Posted by grigorianvlad
    1) What is the best way to do this? Linked lists and memory pointers? Examples?
    This will be a tree structure, so if you have experience implementing them (or more simply, linked lists), then you will find something familiar. Naturally, I suggest encapsulating it in a class. You could have an insert member function that takes a string and adds it to the trie.

    Presumably you would have a private nested class to model the nodes of the trie. Each node would have an array of pointers to its child nodes. Since each array would have at most ten elements, it may make sense to use a std::vector with linear search rather than say, a std::set where the pointers are sorted according to what they point to.

    Quote Originally Posted by grigorianvlad
    2) Will space and storage difference be positive?
    What do you mean?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    How many bits are you storing for each number? Is it less than 64?

    Does the file need to be human readable? Or is a convoluted binary file appropriate?

    Does the file need to be burst readable (multiple simultaneous processing stages)? Or is it always read sequentially?

    Does the file need completely random access? Or can read blocks be adjusted for before buffering?

    Most importantly, what are you doing with the data?

    I'm only asking because Laserlight may have led you far astray. The overhead of serializing a trie to a file has a good chance of being larger than the file you have now if you are going to have to use 64 bit pointers.

    Consider a file with two billion 32 bit integers all of which share the same 29 bit prefix. Each of the two billion 32 bit integers will need at least a 3 bit (32 - 29) index entry (assuming the trie is serialized top-down), every index entry will need at least a 3 bit parent node pointer, and every parent node pointer will need at least a 3 bit node pointer. It may make for a smaller file with such regular data, but now consider the same file format with sixteen billion entries and only 2 bits of globally shared prefix. The file is now approaching the same size as simply storing the 32 bit integers. Now consider the same file format with thirty-two billion 64 bit entries having no globally shared prefix. The file is now considerably larger than simply storing the 64 bit entries.

    So, yea, a trie may be the perfect solution depending on what you are doing, but as long as you are going to take the time asking for help, you need to explain what you are going to be doing.

    For example, you could easily store the same chunk of data in a few hundreds of mebibytes if the entries are often very nearly the same value over a thousands of entries.

    Soma

  5. #5
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    Consider using a vector/deque instead of linked list, if you need such a large amount of data. Maybe, for the following input data, create 2 different informations:

    Input:
    300
    301
    302
    303
    304
    305
    1010
    1011
    1012
    1013
    1014

    struct NumberInfo {
    long Base;
    long Count;
    }

    1:
    base=300
    count=6

    2:
    base=1010
    count=5

    It depends what portion of the data is repetitve, and whether the numbers of the sequence follow one another, or not: 3 4 5 1 2
    And how fast is the number lookup supposed to be?
    The space difference is going to be very huge. If all the numbers are repetitive there is only one NumberInfo instance, but if all of them belong to different sequences, like: 1 3 6 8, there is going to be much more memory needed.
    Last edited by kmdv; 09-15-2010 at 08:46 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Link List Insert prob
    By Bebs in forum C Programming
    Replies: 8
    Last Post: 12-03-2008, 10:28 PM
  2. reading data from a file - link list
    By peter_hii in forum C++ Programming
    Replies: 7
    Last Post: 10-25-2006, 09:11 AM
  3. Please Help - Problem with Compilers
    By toonlover in forum C++ Programming
    Replies: 5
    Last Post: 07-23-2005, 10:03 AM
  4. compiler build error
    By KristTlove in forum C++ Programming
    Replies: 2
    Last Post: 11-30-2003, 10:16 AM
  5. singly linked list
    By clarinetster in forum C Programming
    Replies: 2
    Last Post: 08-26-2001, 10:21 PM