Thread: Extracting data from streams

  1. #1
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654

    Extracting data from streams

    I am currently aghast at the lack of support of reading information from files in C++.
    I can't find a good way to read a chunk of data from them.

    My current goals are to be able to iterate over the contents of a file using random access/forward iterators (because I need ++, += and []), which, of course, stream iterators doesn't support (pathetic things).
    So I thought I'd implement a buffered stream iterator that reads chunks of data from the file and allows for random access to it.
    What I'm stuck on, however, is reading that chunks of data from the file.
    I have two approaches:

    (IndiceType is a typedef for std::size_t.)
    Code:
    IndiceType BufferSize = m_Buffer.size();
    m_Buffer.resize(BufferSize == 0 ? 1 * MB : BufferSize * 2);
    m_ItEnd = m_Buffer.size();
    m_pFile->read((&*m_Buffer.begin() + m_ItCurrent),
    	std::streamsize(m_ItEnd - m_ItCurrent));
    This approach uses a vector and indices to manually resize the vector and read a chunk of data from the file. Unfortunately, I don't know how much data it reads.

    (T is a template type for char or wchar_t.)
    (BufType is a typedef for std::vector<T>.)
    Code:
    std::istream_iterator<T, T> SrcIt(*m_pFile);
    std::istream_iterator<T, T> EndIt;
    std::back_insert_iterator<BufType> DstIt(m_Buffer);
    int i = 0;
    for (;; i++)
    {
    	*DstIt++ = *SrcIt++;
    	if (SrcIt == EndIt || i == 1 * MB)
    		break;
    }
    m_ItEnd += i;
    Another approach is this. Basically std::copy with a istream iterator and a back_insert iterator. But it's painfully slow, so it's out of the question, even though it works.
    Another thing that bugs me is that I can't use std::copy, because I can't get an iterator that is 1 MB bytes away from the begin iterator, since istream_iterator doesn't support operator +.

    So any good ideas on how to do this? C++ is worse in this area than C is, really.
    It's truly a shame.

    Member variables in the class that code uses:
    Code:
    BufType m_Buffer;
    IndiceType m_ItBegin;
    IndiceType m_ItCurrent;
    IndiceType m_ItEnd;
    std::basic_istream<T>* m_pFile;
    bool m_End; // Is *this == end()?
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  2. #2
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by Elysia View Post
    This approach uses a vector and indices to manually resize the vector and read a chunk of data from the file. Unfortunately, I don't know how much data it reads.
    Check istream:gcount() after each read operation. That should put you on track.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  3. #3
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Gotcha. Thanks.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  4. #4
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    Code:
    1 * MB
    Probably gratuitous syntactical sugar, but it made me lol.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  5. #5
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Yes, it's syntactic sugar, because it's easier to read than numbers such as 1024 * 1024:
    Code:
    typedef unsigned int ShortUnitSize_t;
    typedef unsigned long long LongUnitSize_t;
    const ShortUnitSize_t KB = 1024;
    const ShortUnitSize_t MB = KB * 1024;
    const ShortUnitSize_t GB = MB * 1024;
    const LongUnitSize_t TB = GB * 1024LL;
    const LongUnitSize_t EB = TB * 1024LL;
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  6. #6
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    I think he means gratuitous in the same way

    p = malloc( BUFSIZ * sizeof(char) );

    would be gratuitous.

    By the way I just thought this is a neat idea.

  7. #7
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    And on my side the optimizer did it's magic on it.
    Gonna steal it, Elysia
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Heh. Go ahead.
    It's a public forum, after all.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. xor linked list
    By adramalech in forum C Programming
    Replies: 23
    Last Post: 10-14-2008, 10:13 AM
  2. Extracting data from a string..
    By lautarox in forum C Programming
    Replies: 17
    Last Post: 09-23-2008, 01:59 PM
  3. How to Parallel reading flat file into C ?
    By anwar_pat in forum C Programming
    Replies: 11
    Last Post: 09-16-2006, 09:44 PM
  4. Binary Tree, couple questions
    By scoobasean in forum C Programming
    Replies: 3
    Last Post: 03-12-2005, 09:09 PM
  5. Dynamic data members?
    By confusalot in forum C++ Programming
    Replies: 4
    Last Post: 02-27-2005, 11:15 AM