Thread: arrays vs lists? And containers in general!

  1. #1
    Registered User
    Join Date
    Sep 2007
    Location
    Arizona
    Posts
    164

    arrays vs lists? And containers in general!

    I am just 3 chapters away from finishing this class on data structures and...the book has gone into great detail about using the different containers - arrays, lists, linked lists, queues, stacks, etc. But so far, there hasn't been any discussion, at least that I can recall or find, that discusses a compare-contrast of each, or the appropriate situations where you would choose to use one instead, and what those specific applications are.

    So is it all a matter of choice and what you are comfortable with using, or are there specific times/situations where one would be better to use that the other.

    I have had this question since the 3rd chapter and expected the topic to be covered, and maybe it will be in one of the last few chapters, but by their headings I don't think it is so.

    I welcome any and all information on this topic. It seems like when to use each, in comparison to others, should be discussed as much as how to use them, especially in a classroom, virtual as it may be, setting.

    Thanks for the input.

  2. #2
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    If you know the exact size of a needed block of memory at compile time, an array declared on the stack is probably best. It's the fastest data type, and the easiest to work with.

    If you will know the exact size of a needed block of memory at runtime, and the size will not change, and if the duration of the memory will be short, then you might want to consider using an array on the heap.

    If the elements in your block of memory will constantly be expanding at an unknown value, then depending upon other factors, other data types such as linked lists, std::vector, etc. are probably a better option. As an example of other things to consider.... Linked lists are probably a bad idea if you're going to be re-ordering the data all the time.

    Queues and stacks are special in the sense that they are useful for solving specific problems.

    All in all, it depends upon what problem you're trying to solve. You won't necessarily kill your program if you use a std::vector instead of an array, but if you add in a bunch of extra complexity by trying to dynamically resize blocks of memory with pointers, you do run the risk of making a mistake that could cost you time and energy spent in debugging. The STL is there to help you out and provide a lot of tested code. Less effort on your end to do more.

    I'm not saying don't practice writing your own containers if you want to go for it for the practice. Just remember to try to always have reasons for what you do.

  3. #3
    Registered User
    Join Date
    Sep 2007
    Location
    Arizona
    Posts
    164
    Well, I appreciate your comments MacGyver.
    When we were given assignments to write code, I ALWAYS used the STL containers when I wasn't explicitly forbidden. I felt it was the easiest approach and offered the least number of lines to write, so that is a given for me. But, some of the gyrations we have been put through when using the STL was available was...educational, I guess. I understand that a class needs you to do things for the sake of learning, but...there hasn't been any coverage on when a specific container would be the "best" option, even in the examples offered in the text.

  4. #4
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    I wouldn't worry about that to be honest. The important thing is to accept that you need to learn all the data types and be familiar enough to work with them when needed. Eventually, the more you learn about them, the more obvious it will be when to use what.

    It's sort of like learning what a pointer is and how to use it. Many examples shown are usually trivial and not the ideal time to really use them. When students actually understand pointers from those simple examples, some get totally suspicious and wonder what the reasons behind pointers actually are. Some go so far as to say pointers aren't even useful.

    It's the same with these data types. They are useful, but you have to master using them in situations that may not seem the best place to you at first. Think about a simple program. If I ask you to write a program to read in 10 integers, you'd probably use an array, right? If I asked you to read an unknown number of integers, but the first integer would be the total number of integers to read, you could use allocate a block of n ints by way of new. The higher the complexity of the program I ask you to write, the more you should be relying on more advanced data types.

    As far as stacks and queues go.... their usages become more obvious when you have to use them to solve problems. Both stacks and queues can be implemented with arrays or linked lists.

  5. #5
    Registered User
    Join Date
    Sep 2007
    Location
    Arizona
    Posts
    164
    The reason I even made this post is because the chapter I am reading now, is on sorting algorithms, which came after searching algorithms. The programming assignments for the searching chapter were all about rewriting the iterative search loops as, guess what, recursive search functions! Now sorting, which logically follows searching because sorting uses searching to sort, is taking each kind of sort and applying it to the different types of containers (mostly array-based lists and linked list based lists) while getting progressively faster with the sort mechanisms using the Big O analysis on key comparisons and element moving. Then our programming assignments are to write the selection sort and insertion sort functions for the other type of container than the one used as the example in the text. Nothing is focusing on the faster sorting mechanisms at the programmer’s disposal. To me, it is akin to a calculus class that focuses on the algebra portions of the calculations not the understanding and use of the calculus.

    So, I started thinking about the "uses" of all the containers studied and how the application/use hasn't really been a focus. And I couldn't explain to someone when each would be a "best use" for a specific case, except for the size (fixed or dynamic) at compile time. That is when I decided to post my question.
    Last edited by clegs; 12-02-2007 at 01:34 PM.

  6. #6
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    Some things that affect the type of container you use include whether you want the elements sorted, stable or neither, whether you need random access to the elements, whether you need bidirectional access, whether you will be adding elements in the middle, at the front, at the back, or some combination, whether you will be deleting elements and if those deletions will occur in the middle, at the front, at the back or some combination, whether you know how many elements will be held by the container, and probably many other things.

    When you answer all those questions about your problem, then you look at how each container handles each possibility and decide what is most important.

    Sometimes the answers force your hand. For example, if you need a lot of bidirectional access, then a singly-linked list would provide horrible performance, so you can probably eliminate that option. Other times you just have to weigh the benefits of each, and maybe either solution is fine.

    Let's take an example. Suppose you are assigned to read in a text file, then output a list of all strings found in the file without showing duplicates (the output doesn't have to be in any particular order). How would you answer these questions:
    1. Will you know how many elements will be in the container before you start filling it?
    2. Does the container need to be sorted?
    3. Do the elements in the container need to stay in the same order that you add them?
    4. If the container needs to retain the order you provide, where will elements be inserted (front, middle, back, or some combination)?
    5. Will you be deleting elements?
    6. If you will be deleting elements, where will the deletions occur (front, middle, back, or some combination)?
    7. Do you need to be able to iterate forward over the container?
    8. Do you need to be able to iterate backward over the container?
    9. Do you need to be able to randomly find a value in the container?


    If you want, I'll let you have a go at answering those questions. Then I'll give my answers and what I think the best solution(s) will be.

  7. #7
    Registered User
    Join Date
    Sep 2007
    Location
    Arizona
    Posts
    164
    Cool!
    This is what isn't addressed in the book, and I am not sure I will come up with something even close to being "right."
    1. I don't think so, you don't know what is contained until it's read.
    2. Not by the initial parameters given, but they need to be checked for duplicates, so there has to be some sort of order, or check mechanism, to do that. I am not sure if a check at read, or a sort after filling is the best option. I am assuming a sort after filling, and then compare to string at either side is the fastest.
    3. No because if the duplicates are eliminated, that implies there will be movement, in both directions, of some form.
    4. The container doesn't need to retain the original order.
    5. Deleting is required because duplicates are eliminated.
    6. Deleting will occur where ever there is a duplicate detected.
    7. Yes, at least at output.
    8. I don't think so.
    9. Yes, if you are looking for duplicates.
    I am not sure if I am even close, but I gave it a shot.
    I do not know what the best container is based on the info and algorithm efficiency. Dang! This is the kind of stuff I need to learn. I hope there is another class in my future that will address this.

    Let me have it, where did I go wrong?

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by clegs View Post
    I am just 3 chapters away from finishing this class on data structures and...the book has gone into great detail about using the different containers - arrays, lists, linked lists, queues, stacks, etc. But so far, there hasn't been any discussion, at least that I can recall or find, that discusses a compare-contrast of each, or the appropriate situations where you would choose to use one instead, and what those specific applications are.
    You are confusing containers with USES of contains. Arrays and lists are containers. Queues and stacks are specific USES of these containers. You can implement a queue or a stack using an array or a list.

  9. #9
    Registered User
    Join Date
    Sep 2007
    Location
    Arizona
    Posts
    164
    It got me thinking about what I have in my head about what I have learned. But I also found myself getting confused because I would think. "if I did it this way, then..., but I could do it this way and then..." Flip-flopping about like that is what I tend to do and I end up confusing myself with the approach I want to take. But, I think that a lot of that is because I have only so much time to complete the class and I have an imaginary clock ticking away in my head that causes me to loose focus before I complete a though on the process. I am aware of that, and know that is something I have to work on. Getting clear on all the options available and those that work best for a given situation is something that I do not have, and I know that.

    One thing I do do is, revisit my code up until I have to turn in the assignment. For me looking at a working program allows me to think through the process I have wriitten and find a shorter route. I also help other students who post on the school BB that they are having problems with their code. I enjoy doing that a lot. Debugging code I didn't write and figuring out what they meant to do and what they actually have happening. That is fun to me.

  10. #10
    Registered User
    Join Date
    Sep 2007
    Location
    Arizona
    Posts
    164
    You are confusing containers with USES of contains. Arrays and lists are containers. Queues and stacks are specific USES of these containers. You can implement a queue or a stack using an array or a list.
    I thought they were subsets of, at least that is what I gleaned from the text "implement stack as a..." and "implement queue as a...." Thanks!
    Last edited by clegs; 12-02-2007 at 02:49 PM.

  11. #11
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by clegs View Post
    I thought they were subsets of, at least that is what I gleaned from the text. Thanks!
    You could look at it as a subset I guess. I tend to think of them more as an application of basic data structures.

  12. #12
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    eg, it wouldn't be a good idea to use vector/array when you are expecting to insert/delete values in the middle (on average you have to move half the array one space, hence the linear time). A linked list would be a better idea because you can insert elements in constant time (without having to move any existing element). On the other hand, vector/array gives you constant time random access (data is always at pointer + subscript) whereas to access a random element in a linked list, you have to iterate through half the list on average, hence the linear time.

    Therefore, one of the factors u have to consider when choosing a container is the expected usage.

  13. #13
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    >> I am not sure if I am even close, but I gave it a shot.
    I think you did really well.

    I think there are a couple options for removing duplicates. You can add all words and then sort and delete duplicates, or you can check the container first before adding and skip a word if it is a duplicate. The first option probably requires the container to be sortable after it is filled and requires deletes anywhere in the container. The second one requires random access but doesn't require deletes. We'll look at both options.

    Ok, so now given those answers, let's figure out what type of container is best. I'll focus on arrays, lists, hash tables, and binary trees, although there are many other options. Let's look at each of those four container types both for removing duplicates at the end and for skipping the insertion of duplicates.

    Array with removing duplicates:
    • Dynamic size is possible, but requires reallocation strategies (like the ones a vector uses) - ok
    • Sorting a full array one time is pretty fast and better than keeping it sorted as you go - ok
    • Deletion from the middle of an array is expensive, and removing duplicates will require that deletion - bad


    Array with skipping duplicates:
    • Dynamic size is possible, but requires reallocation strategies (like the ones a vector uses) - ok
    • Requires sorting because the array is indexed by a number and we are storing strings. So to achieve the solution we'd have to keep the array sorted. Finding the insertion point of a sorted array would be O(log n) so keeping it sorted isn't too bad - ok
    • It would require insertion in the middle, though, which is not particularly fast with an array - bad


    List with removing duplicates:
    • Dynamic size is standard - good
    • Sorting a linked list one time is slower than for an array because the list lacks good random access - bad
    • Deletion from the middle of linked list is easy - good


    List with skipping duplicates:
    • Dynamic size is standard - good
    • Random access on a list is linear. You could keep the list sorted but that would be just as bad because inserting into a sorted list is linear anyway - very bad


    Hash table with removing duplicates:
    • Because inserts always find the same location, this implementation will be exactly the same as a hash table with skipping duplicates, except we will have to go back and remove duplicates at the end. That makes no sense, so this will not be an option - very bad


    Hash table with skipping duplicates:
    • Dynamic size is difficult and could cause bad worst case performance - bad
    • Random access is fastest here assuming a properly sized table - good


    Binary tree with removing duplicates:
    • Because inserts always find the same location, this implementation will be exactly the same as a binary tree with skipping duplicates, except we will have to go back and remove duplicates at the end. That makes no sense, so this will not be an option - very bad


    Binary tree with skipping duplicates:
    • Dynamic size is standard - good
    • Random access is o(log n) which is good - good


    So looking at all the options, we can eliminate list with skipping duplicates, hash table with removing duplicates and binary tree with removing duplicates.

    That still leaves 5 options.

    If you think about it, the array with skipping duplicates is virtually the same as the binary tree. You have to keep the array sorted by doing a binary search for each insertion point. That's what a binary tree does automatically. However, a binary tree has much faster inserts and scales dynamically automatically. So I'd choose a binary tree over an array with skipping duplicates. That leaves only the array with removing duplicates as a possibility.

    What about a linked list? The deletions will be faster than with an array but the sorting will be slower. Which is more important?

    A binary tree looks good, it can grow dynamically and has good (but not great) random access.

    A hash table has great random access in the best case, but what if we don't know the number of unique words we'll be getting?



    Putting that together, the best answer still depends on what information we have about the input.

    If we know a good estimate of how many unique words we will get before we start, then the hash table would be my first choice. This might be true for example if the input files had a consistent ratio of unique words to total words, and a consistent ratio of total words to file size. We would then use the file size to estimate how big to make the hash table.

    If those ratios fluctuated a lot, I'd be hesitant to use a hash table because of the danger of either hitting the worst case if we estimate too small or wasting memory if we estimate too big.

    So then we would have an array or list that gets filled up with all the words, sorted, then duplicates are deleted, or a binary tree where duplicates are skipped on insertion. If we know that the number of duplicates will be big, then the slow array deletion rules that out. Also, that means that the array or list is filled with many values and uses much more memory than the binary tree, which only holds the unique values. So in that case I'd say the binary tree wins.

    Finally, if the number of deletes is small, then the array is probably faster than the list because the sort is faster. The question is whether the array is better than the binary tree. In this case, insertion at the end of the vector and sorting at the end is probably a little faster than maintaining a sorted order with the binary tree. However, the insertion into the binary tree automatically handles the duplicates, whereas the array will require an extra pass at the end to delete those duplicates. I would think that even with only a few duplicates that it would not be worth it to do that, so I'll choose binary tree here again.

    That's good because if we don't have any idea what the ratio of duplicates to unique words is, we can comfortably use the binary tree.

    Conclusion:

    I like the hash table if we can estimate the number of unique words fairly well, and the binary tree otherwise. In C++, that means std::tr1::unordered_set for the hash table or std::set for the binary tree.

    Of course, in practice, once you make that decision, you encapsulate it well and when you are mostly finished you profile the code to see if the performance is satisfactory. If it isn't, then you can try the other options. There can be many other factors that might make another option perform noticeably better in the real world. For now, though, you want to learn how to make the best theoretical decision so you will get off on the right foot when you start those real world problems.
    Last edited by Daved; 12-02-2007 at 05:27 PM.

  14. #14
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Some thing inexperienced people always forget when evaluating container suitability is to think about the frequency and place in time of the operations. For example, they want the data structure to be always ordered and to allow for lookup by value, and also iteration, and insertion in the middle. They choose a binary tree, which supports all these operations with good complexity factors.

    But what they really meant was: I need to insert a lot of elements at the start of the program, and after that an occasional one in the middle, and I iterate over the elements all the time. Suddenly, a sorted vector is far more attractive.

    Pick your container for the really common operations, not for all that might occur.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  15. #15
    Registered User
    Join Date
    Sep 2007
    Location
    Arizona
    Posts
    164

    Lightbulb

    THANK YOU SO MUCH DAVED!

    This is the exact kind of thing that I wanted/expected to see when the text was presenting these different data structures. But, it wasn't addressed at all. I would have liked to see a section in each chapter that made a contrast/compare reference to each of the structures presented before it. That would help in providing a practical application that makes the material sink in and stick for me.

    I haven't read about binary trees yet, that chapter is next and the sections covering hashing were brief and part of the chapter on search algorithms. The final two chapters cover graphs and the STL associative containers.

    This explanation you gave took a huge amount of your time. I appreciate your efforts immensely. Thanks.

    When I take a class I don't want to get a good grade for the sake of, I want to learn the subject matter and have a dialog about options and constraints of the material presented. I have tried with the instructor for this course, but it hasn't worked well. So, instead I research this board (which is the best of the ones I have visited) and its tutorials. Your explanation has given me specific considerations involved in deciding which container to use. This is really great!! Thanks, have I said thanks enough? I mean it - thank you.

    This is good. Really, really good. Thanks
    Last edited by clegs; 12-02-2007 at 06:55 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Noob STL question about Containers.
    By Swerve in forum C++ Programming
    Replies: 2
    Last Post: 03-15-2009, 12:02 PM