Thread: Having issues with indexing correctly in a file

  1. #1
    Registered User
    Join Date
    Apr 2004
    Posts
    28

    Having issues with indexing correctly in a file

    Hi,

    I have a text data file that spreads over many lines, and i've been using fstream's 'tellg()' to gather position offsets in the file in relation to data read into a buffer from the file (I am reading it in chunks). However when I try to index elements on lines other than the first the index is slightly wrong.

    I'm on a windows machine so I know that newline characters in text files are probably classed as two characters: \r\n (0xd 0xa) which would explain why the index is wrong over multiple lines.

    I'm using:

    extractedSize = fin.rdbuf()->sgetn(...

    to read in the data chunks and am relying on its returned value to be the exact amount of bytes extracted from the file. My question is, would this returned value count the newline as having read in 1 character or 2, and is there some easy solution to this im missing. >_<

    Thank you.

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Under Windows, you have to open the file in binary mode to prevent '\r\n'->'\n' conversions and to use tell[gp]()/set[gp]() reliably.

    gg

  3. #3
    Registered User
    Join Date
    Apr 2004
    Posts
    28
    Thanks for the reply again codeplug,

    I'm now reading the file in binary format but it still seems to mess slightly differently with my tellg() calls, it seems the sgetn() call is still reading in each newline as a byte (but the file doesnt count that position as an index). So when I access an index on the first line its fine, but on the second line i try to access an index and it gets the character to the right of the one I wanted (+1).

    Does this mean that I need to find where newlines are once the data is in memory and account for it there (and take away from the number when I encounter one to keep a valid indexing system)? or is there a simple way.

    thanks.

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You may need to post the code.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User
    Join Date
    Apr 2004
    Posts
    28
    I open the file like this:
    std::ifstream fin(filename->c_str(), std::ifstream::in|std::ifstream::binary);

    ...

    and im reading in chunks of it at a time like this:
    extractedSize = fin.rdbuf()->sgetn(&bufferPrimary[bufferIndex], (bufferSize - bufferIndex));

    I read it in like this because sometimes part of the old chunk needs to be in the start
    of the new chunk.

    ...

    Later on I find a character in the buffer that I need the file index for, and I push that file index onto a list:
    indexList->push_back(((int)fin.tellg() - extractedSize) + bufferIndex + 1);

    so i'm getting the current position of the file pointer (tellg) taking away the amount I extracted with this chunk, and adding an amount equal to the position in the buffer of the character I want the file index for. As I said it works for accessing things on the first line of the file, but if call this code line for an index that is on the second line of the file it will be 1 index past the one that I want, because of the newline that was read in counts as another byte (in extractedSize) but is not an actual index in the file (i think).

    I'm just trying to find an easy solution for this, because I can only think of messy ones. >_<

    Thanks.

  6. #6
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    The problem is in your code somewhere. Consider the following:
    Code:
    #include <iostream>
    #include <fstream>
    #include <iomanip>
    #include <cctype>
    using namespace std;
    
    void dump_file(ifstream &fin)
    {
        char buff[5];
        int read;
        while (read = fin.rdbuf()->sgetn(buff, sizeof(buff)))
        {
            cout << '"';
            for (int n = 0; n < read; ++n)
            {
                const char &c = buff[n];
                if (::isprint(c))
                    cout << c;
                else
                {
                    if (c == '\r')
                        cout << "\\r";
                    else if (c == '\n')
                        cout << "\\n";
                    else
                        cout << "???";
                }//else
            }//for
    
            cout << "\"  tellg = " << fin.tellg() << endl;
        }//while
    }//dump_file
    
    int main()
    {
        ifstream fin("data.txt", ios::in | ios::binary);
        if (!fin)
        {
            cerr << "Failed to open file" << endl;
            return 1;
        }//if
    
        dump_file(fin);
    
        cout << "\n...Resetting to offset 10\n" << endl;
        fin.seekg(10, ios::beg);
    
        dump_file(fin);
    
        fin.close();
    
        return 0;
    }//main
    Data.txt contains "Line1<return>Line2<return>Line3". This is the output I get with both VS 6.0 and VS 2008 (oldest and newest MS CRT's):
    Code:
    "Line1"  tellg = 5
    "\r\nLin"  tellg = 10
    "e2\r\nL"  tellg = 15
    "ine3"  tellg = 19
    
    ...Resetting to offset 10
    
    "e2\r\nL"  tellg = 15
    "ine3"  tellg = 19
    So in binary mode the file is read byte for byte and absolute file positions work as expected.

    gg

  7. #7
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    If the file isn't terribly large, I'd recommend reading the entire contents into a buffer and then just parse the data directly...
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. opening empty file causes access violation
    By trevordunstan in forum C Programming
    Replies: 10
    Last Post: 10-21-2008, 11:19 PM
  2. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  3. Post...
    By maxorator in forum C++ Programming
    Replies: 12
    Last Post: 10-11-2005, 08:39 AM
  4. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM