Thread: vector or linked list?

  1. #1
    Banned
    Join Date
    Aug 2017
    Posts
    861

    vector or linked list?

    (I'll try to keep this short and simple)

    I am just starting research on what should I use in C++. The scenario is. Open dir / sub dir, read in file names, store the absolute path and filename for later use. an array of type vector should be able to do this, Yes?

    But unlike arrays, their size can change dynamically, with their storage being handled automatically by the container.
    vector - C++ Reference

    If this vector has the ability to dynamically grow due to not knowing what size is needed ahead of time, and each time the program is ran it can be different it should be able to be used to just gather up filenames with there paths and store them for later use and traverse down the list of names (both ways). even do a compare name then return if found. even select a specif element by a number when imputed.

    mock code, so don't get all techy on it.
    Code:
    print(what number file wanted>);
    
    num = getNum()
    
    fileName =  vector.at(num);
    that too would be faster then traversing down a list until found, then return data.

    no real size limits? 10,000,000 files isn't too big even?

    what are the pros, cons ?

  2. #2
    Guest
    Guest
    I have never used a linked list, but they have a bad image for all I can tell. I definitely don't see how they'd be preferable for a plain list of files.

    A vector would be fine for what you want.

    From what I understand, your vector would not change after you have filled it. In that case, you should sort it. Then you can do binary search which is fast.

    Indexing by number works too if you supply one, sure!

    Alternatively, you could consider a std::unordered_map for good search performance. However, a vector will be more space efficient.

    10,000,000 files isn't too big even?
    Depends on how long those paths and filenames are and your available memory of course.

    p.s. C++ appears to have filesystem support in the standard library now (neat!). I haven't used it and no idea if it would help or complicate things for you. I assume you'll be querying the directory using a C API?
    Last edited by Guest; 09-30-2017 at 09:00 AM.

  3. #3
    Banned
    Join Date
    Aug 2017
    Posts
    861
    Quote Originally Posted by Guest View Post
    I have never used a linked list, but they have a bad image for all I can tell. I definitely don't see how they'd be preferable for a plain list of files.

    A vector would be fine for what you want.

    From what I understand, your vector would not change after you have filled it. In that case, you should sort it. Then you can do binary search which is fast.

    Indexing by number works too if you supply one, sure!

    Alternatively, you could consider a std::unordered_map for good search performance. However, a vector will be more space efficient.


    Depends on how long those paths and filenames are and your available memory of course.

    p.s. C++ appears to have filesystem support in the standard library now (neat!). I haven't used it and no idea if it would help or complicate things for you. I assume you'll be querying the directory using a C API?
    yeah a check for length of path of course, and no updating files whence gotten it will not grow or shrink.

    thanks, finally a c++ reference for what it has. I book mark them pages. all of what I have been googling is showing how to in C for C++ more or less. I already got code that opens and reads base dir files, without sub dir + files, but that filesystem support looks really interesting.

  4. #4
    Guest
    Guest
    I already got code that opens and reads base dir files
    Good, I think you should stick to that for now. I had assumed std::filesystem had broad compiler support since it has been standardized but that's not the case, so your mileage may vary.

    Regarding memory use, if we assume that you use 2 std::string per entry (path + filename) and that the average path+filename is 64 characters long, then we might say that each entry uses (2*32+64) = 128 bytes. So 10 million entries would consume ~1.2GiB memory. The compiler might optimize short strings, which typical filenames qualify for, so this is a probably a pessimistic estimate.

  5. #5
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    I had assumed std::filesystem had broad compiler support since it has been standardized but that's not the case, so your mileage may vary.
    Well since this is a C++17 feature you may need to wait until C++17 is actually approved before you could even begin to assume broad compiler support. If you want support prior to C++17 you may want to consider boost/filesystem.

    By the way I agree that a vector would probably be the best choice for the container.


    Jim

  6. #6
    Banned
    Join Date
    Aug 2017
    Posts
    861
    Quote Originally Posted by jimblumberg View Post
    Well since this is a C++17 feature you may need to wait until C++17 is actually approved before you could even begin to assume broad compiler support. If you want support prior to C++17 you may want to consider boost/filesystem.

    By the way I agree that a vector would probably be the best choice for the container.

    Jim
    yeah ,...

    Code:
    #include <iostream>
    #include <algorithm> 
    #include </usr/include/c++/5.3.0/experimental/filesystem>
    #include <string>
    #include <vector>
    /*
    #include <dirent.h>
    #include <sys/types.h>
    */
    #include "files.h"
    
    void print_directory(char *path)
    {
      std::string spath;
      spath = path;
    
      for (auto & p : std::filesystem::directory_entry(spath))
              std::cout << p << std::endl;
    
    }
    gets this:
    Code:
    [Running] cd "/media/data/C-Projects/VSC/C++/mh5000/" && g++ -std=gnu++11 *.cpp 
    files.cpp: In function 'void print_directory(char*)':
    files.cpp:19:24: error: 'std::filesystem' has not been declared
       for (auto & p : std::filesystem::directory_entry(spath))
    whereas other experimental do not give such error, ie. #include <algorithm>
    because I do have that,
    Code:
    bash-4.3$ ls /usr/include/c++/5.3.0/experimental
    algorithm  chrono      fs_dir.h  fs_ops.h   functional    ratio         string_view.tcc  tuple
    any       filesystem  fs_fwd.h  fs_path.h  optional    string_view  system_error     type_traits
    I do agree, vector too, saves a lot of coding too.

    seen what filesystem has but now I'm stuck for a bit trying to get Slackware Linux updated maybe to GCC 7.xx , maybe
    Last edited by userxbw; 09-30-2017 at 04:49 PM.

  7. #7
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    [Running] cd "/media/data/C-Projects/VSC/C++/mh5000/" && g++ -std=gnu++11 *.cpp
    If you have any hope of getting std::filesystem to work you need to specify at least -std=c++14, but your compiler is really fairly old and probably won't support even C++14 completely much less C++17. Also filesystem should probably be in std::experimental if you're using a more modern compiler.

    By the way you should be compiling with more compiler flags to enable more warnings to be generated I suggest g++ -Wall -Wextra -pedantic -pedantic-errors -Wvla -g -std=c++11 as a minimum. If you upgrade your compiler then I would recommend -std=c++14 as the minimum.

    I really suggest you don't use the -std=gnu(anything) stick with standard C++ and avoid any of the compiler hacks that are allowed with the gnu settings.

    Jim

  8. #8
    Banned
    Join Date
    Aug 2017
    Posts
    861
    Quote Originally Posted by jimblumberg View Post
    If you have any hope of getting std::filesystem to work you need to specify at least -std=c++14, but your compiler is really fairly old and probably won't support even C++14 completely much less C++17. Also filesystem should probably be in std::experimental if you're using a more modern compiler.

    By the way you should be compiling with more compiler flags to enable more warnings to be generated I suggest g++ -Wall -Wextra -pedantic -pedantic-errors -Wvla -g -std=c++11 as a minimum. If you upgrade your compiler then I would recommend -std=c++14 as the minimum.

    I really suggest you don't use the -std=gnu(anything) stick with standard C++ and avoid any of the compiler hacks that are allowed with the gnu settings.

    Jim
    yeah I'm being lazy about it until I get it to work, per se' but yeah I already tried ++17 but it did pick up something from having to put that -std= line in it from a different file / little project I compiled.

    std::experimental : I'll give that a try. this is all new to me. so its hit and miss ...

    Code:
    In file included from /usr/include/c++/5.3.0/experimental/filesystem:40:0,
                     from files.cpp:3:
    /usr/include/c++/5.3.0/experimental/fs_dir.h:309:3: note: candidate: std::experimental::filesystem::v1::recursive_directory_iterator std::experimental::filesystem::v1::begin(std::experimental::filesystem::v1::recursive_directory_iterator)
       begin(recursive_directory_iterator __iter) { return __iter; }
    Looks like it is trying to tell me something, hehe, just got to figure it out, std::experimental::filesystem::v1::recursive_direc tory_iterator
    maybe use this one header instead,
    fs_dir.h
    Last edited by userxbw; 09-30-2017 at 05:33 PM.

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Have you listed all your requirements?

    Vector vs list isn't much of a choice when there are many others to choose from (not including anything you can custom design)
    Containers library - cppreference.com

    Do you intend to sort the list after you've read the files from the file system?
    Don't assume that any order that comes from the file system is compatible with string comparison necessary for say a binary search.

    How is the user supposed to 'know' an index number for a file?
    Does your UI list all the files with a handy index number next to it?

    Storing the whole path for every file is a massive overhead if you have lots of files in deep directories. I figure your 10M files aren't all in the root directory.

    Vectors typically waste 25% of the allocated space.
    Code:
    #include<iostream>
    #include<vector>
    using namespace std;
    
    int main()
    {
      vector<int> a;
      vector<int>::size_type s = a.capacity();
      for ( int i = 0 ; i < 10000000 ; i++ ) {
        a.push_back(i);
        vector<int>::size_type q = a.capacity();
        if ( s != q ) {
          cout << "At index << " << i <<
                  ", capacity increased from " << s <<
                  " to " << q << endl;
          s = q;
        }
      }
    }
    
    $ g++ foo.cpp
    $ ./a.out 
    At index << 0, capacity increased from 0 to 1
    At index << 1, capacity increased from 1 to 2
    At index << 2, capacity increased from 2 to 4
    At index << 4, capacity increased from 4 to 8
    At index << 8, capacity increased from 8 to 16
    At index << 16, capacity increased from 16 to 32
    At index << 32, capacity increased from 32 to 64
    At index << 64, capacity increased from 64 to 128
    At index << 128, capacity increased from 128 to 256
    At index << 256, capacity increased from 256 to 512
    At index << 512, capacity increased from 512 to 1024
    At index << 1024, capacity increased from 1024 to 2048
    At index << 2048, capacity increased from 2048 to 4096
    At index << 4096, capacity increased from 4096 to 8192
    At index << 8192, capacity increased from 8192 to 16384
    At index << 16384, capacity increased from 16384 to 32768
    At index << 32768, capacity increased from 32768 to 65536
    At index << 65536, capacity increased from 65536 to 131072
    At index << 131072, capacity increased from 131072 to 262144
    At index << 262144, capacity increased from 262144 to 524288
    At index << 524288, capacity increased from 524288 to 1048576
    At index << 1048576, capacity increased from 1048576 to 2097152
    At index << 2097152, capacity increased from 2097152 to 4194304
    At index << 4194304, capacity increased from 4194304 to 8388608
    At index << 8388608, capacity increased from 8388608 to 16777216
    std::vector - cppreference.com
    Insertion or removal of elements at the end - amortized constant O(1)
    In order to achieve this amortization, the usual strategy is to double the amount of space each time.

    Also, vectors are constrained to be in contiguous memory. This is usually fairly low cost if all you're doing is extending a single vector with a POD type. But filling each vector entry with dynamic objects like strings (which cause further allocations) is likely to result in your entire vector being moved every time it's extended.

    Or call .reserve() with a big number to start with.

    But vector will give you the O(1) index capability and O(Log(n)) search capability (after you sort it). Again, we're back to your requirements.

    Walking the directory tree is in the FAQ.
    FAQ > Accessing a directory and all the files within it - Cprogramming.com
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #10
    Banned
    Join Date
    Aug 2017
    Posts
    861
    Quote Originally Posted by Salem View Post
    Have you listed all your requirements?

    Vector vs list isn't much of a choice when there are many others to choose from (not including anything you can custom design)
    Containers library - cppreference.com

    Do you intend to sort the list after you've read the files from the file system?
    Don't assume that any order that comes from the file system is compatible with string comparison necessary for say a binary search.

    How is the user supposed to 'know' an index number for a file?
    Does your UI list all the files with a handy index number next to it?

    Storing the whole path for every file is a massive overhead if you have lots of files in deep directories. I figure your 10M files aren't all in the root directory.

    Vectors typically waste 25% of the allocated space.
    Code:
    #include<iostream>
    #include<vector>
    using namespace std;
    
    int main()
    {
      vector<int> a;
      vector<int>::size_type s = a.capacity();
      for ( int i = 0 ; i < 10000000 ; i++ ) {
        a.push_back(i);
        vector<int>::size_type q = a.capacity();
        if ( s != q ) {
          cout << "At index << " << i <<
                  ", capacity increased from " << s <<
                  " to " << q << endl;
          s = q;
        }
      }
    }
    
    $ g++ foo.cpp
    $ ./a.out 
    At index << 0, capacity increased from 0 to 1
    At index << 1, capacity increased from 1 to 2
    At index << 2, capacity increased from 2 to 4
    At index << 4, capacity increased from 4 to 8
    At index << 8, capacity increased from 8 to 16
    At index << 16, capacity increased from 16 to 32
    At index << 32, capacity increased from 32 to 64
    At index << 64, capacity increased from 64 to 128
    At index << 128, capacity increased from 128 to 256
    At index << 256, capacity increased from 256 to 512
    At index << 512, capacity increased from 512 to 1024
    At index << 1024, capacity increased from 1024 to 2048
    At index << 2048, capacity increased from 2048 to 4096
    At index << 4096, capacity increased from 4096 to 8192
    At index << 8192, capacity increased from 8192 to 16384
    At index << 16384, capacity increased from 16384 to 32768
    At index << 32768, capacity increased from 32768 to 65536
    At index << 65536, capacity increased from 65536 to 131072
    At index << 131072, capacity increased from 131072 to 262144
    At index << 262144, capacity increased from 262144 to 524288
    At index << 524288, capacity increased from 524288 to 1048576
    At index << 1048576, capacity increased from 1048576 to 2097152
    At index << 2097152, capacity increased from 2097152 to 4194304
    At index << 4194304, capacity increased from 4194304 to 8388608
    At index << 8388608, capacity increased from 8388608 to 16777216
    std::vector - cppreference.com

    In order to achieve this amortization, the usual strategy is to double the amount of space each time.

    Also, vectors are constrained to be in contiguous memory. This is usually fairly low cost if all you're doing is extending a single vector with a POD type. But filling each vector entry with dynamic objects like strings (which cause further allocations) is likely to result in your entire vector being moved every time it's extended.

    Or call .reserve() with a big number to start with.

    But vector will give you the O(1) index capability and O(Log(n)) search capability (after you sort it). Again, we're back to your requirements.

    Walking the directory tree is in the FAQ.
    FAQ > Accessing a directory and all the files within it - Cprogramming.com
    well for one I have no idea of all of my options. the requirements are to just "stuff" absolute path and filenames into a container for later retrieval, size of list is unknown. so yeah, 10,000,000 could be posable. sort or not sort then go down the list one at a time return path/file, or randomize the list then retrieve one filename, do it again. or by a number attached to the file, each file having it own number. sequential would be best, I think. that would be where that array part would come in, just use the N-1 for retrieving it.

    whence loaded it stays that way, noting added, nothing removed, just manipulated by sorting, random, alphabetically, and first come, first serve how ever this gets loaded from intake out of the directory into the list, and what ever else that is available perhaps.

    this project is all new to me in the form of C++ and what it has to offer.

    as far as using the experimental stuff within C++. Thought took me to, if I were to get it to work, if then when someone else tried to compile it, would that not cause posable issues if they to not have the same version compiler and needed files? therefore, limiting its ability to be used elsewhere.
    Last edited by userxbw; 10-01-2017 at 07:10 PM.

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    I suppose one key lesson for you to learn is encapsulation.
    Code:
    class myFileStorage {
      private:
        // vector<string> mFilenames;
        // list<string> mFilenames;
      public:
        void add(string f);
        void sort();
        string index(int n);
        string search(string s);
    };
    You define an interface, and then the rest of the program doesn't care whether the innards are a vector or a list (or a tree, or set, or map, or ...)

    If you choose vector to start with, but find later it just isn't working, then you have minimal rework to recode the class implementation to use list (for example).
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    You may want to consider how to handle the ordering of files and sub-directories within a directory. In the display and copy utilities I've written, I use two main loops for each directory. The first loop handles all the files, then the second loop handles the sub-directories, making a recursive call for each sub-directory encountered. This keeps all the files within a directory next to each other in an ordered list.

    If including support for NTFS, the program will need to deal with reparse points. Since these are links to other existing directories or files, a program can just avoid recursing into a reparse point sub-directory.

    NTFS reparse point - Wikipedia

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 13
    Last Post: 09-22-2013, 10:34 PM
  2. Replies: 4
    Last Post: 05-01-2010, 10:19 PM
  3. single linked list to double linked list (help)
    By Countfog in forum C Programming
    Replies: 8
    Last Post: 04-29-2008, 08:04 PM
  4. Sorting a linked list (or vector)
    By Opel_Corsa in forum C++ Programming
    Replies: 8
    Last Post: 11-19-2006, 08:47 AM
  5. Replies: 6
    Last Post: 03-02-2005, 02:45 AM

Tags for this Thread