Thread: Help with file manipulation.

  1. #16
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Code:
    bool LoadData (ifstream Data)
    This is suspicious. It shouldn't be possible to pass an ifstream by value. It should be passed by reference. If passing by value compiles, then your compiler has a problem, and I have no idea what the behavior of this code is.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  2. #17
    Registered User
    Join Date
    Mar 2012
    Posts
    14
    The program is now working, and the performance improvement is massive. I'm now looking at a different area to improve. Again related to files. This time, the issue is with the output files.

    I take 100,000 entries from the input file, process them, sort them and then save them into an intermediate file. Get the next 100,000, process, sort and save into another intermediate file. Repeat until the input file is emptied. Then I need to merge the resulting intermediate files into one large file.

    The way I'm currently doing that is to grab the first 2 and merge them, discarding any duplicates. Save the results into another intermediate file and delete those two. Grab the next two and merge. Repeat until all the files are merged down into 1 final file.

    This works, but isn't very efficient as it creates an equal number of extra intermediate files which continue to grow in size as the files are condensed together. What I'd prefer to do is create an array of file pointers, open ALL of the original intermediate files at once, then merge them all into one file right then.

    What I'm wondering is, what is the limit on the number of files I can have open concurrently? I'm currently working on a Windows XP 5.1, SP3 system. I've tried searching Google, but can't seem to find anything definitive about how many files my single program can open concurrently or whether that number can be adjusted. Anyone know or know where I should look to find out?

    Quote Originally Posted by CornedBee View Post
    This is suspicious. It shouldn't be possible to pass an ifstream by value. It should be passed by reference. If passing by value compiles, then your compiler has a problem, and I have no idea what the behavior of this code is.
    It did compile, and it did run and work through the first call, but not the second.

    If it matters, it was compiled with the Microsoft Visual C++ 6.0 compiler.

  3. #18
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  4. #19
    Registered User
    Join Date
    Mar 2012
    Posts
    14
    Thanks.

    I'll go away and stop bugging everyone now.

    Again, thanks.

  5. #20
    Registered User
    Join Date
    Mar 2012
    Posts
    14
    Okay, I'm back again.

    Now I'm trying to improve the efficiency of my file merging by merging multiple files at once. But I've again hit a snag. I suspect it is the same issue that CornedBee pointed out cropping up again, but am unsure.

    Here is the relevant function which is crashing:
    Code:
    typedef struct inputfiles
    {
        char sCurrData[50];
        char sFileName[80];
        ifstream Data;
        struct inputfiles *next;
    
    }INPUTFILES;
    
    typedef INPUTFILES * pInputFiles;
    
    void Debug2 ()
    {
        char sData[360] = {0},
             sData2[360] = {0},
             sFile[80];
        int iCurr,
            iDup,
            iLast,
            x,
            iFilesDone,
            iOffset = 7,
            iLevel = 15;
        inputfiles sFileList;
        inputfiles *pFileList = 0,
                   *pTemp = 0;
        ofstream Output;
    
        iCurr = 0;
        iDup = 0;
    
        sFileList.next = 0;
    
        for (x = 0; (x - 1) < ( (iOffset - 1) / MERGESIZE); x++)
        {
            for (iLast = 0; (iLast < MERGESIZE) && (iLast + x * MERGESIZE) < iOffset; iLast++)
            {
                pFileList = (pInputFiles)calloc (1, sizeof (inputfiles) );
                sprintf (sFile, "c:\\dev\\connect4\\connect4\\level%d_%d.txt", iLevel, (iLast + 1 + x * MERGESIZE) );
                (pFileList->Data).open (sFile);
                pFileList->Data >> pFileList->sCurrData;
                strcpy (pFileList->sFileName, sFile);
    
                if (    (sFileList.next == 0)
                     || (strcmp (pFileList->sCurrData, sFileList.sCurrData) < 0)
                   )
                {
                    pFileList->next = sFileList.next;
                    sFileList.next = pFileList;
                }
                else
                {
                    pTemp = sFileList.next;
    
                    while (    (pTemp->next)
                            && (strcmp (pFileList->sCurrData, pTemp->sCurrData) < 0)
                          )
                    {
                        pTemp = pTemp->next;
                    }
                    pFileList->next = pTemp->next;
                    pTemp->next = pFileList;
                }
            }
    
            if (iLast + x * MERGESIZE < iOffset)
            {
                iOffset++;
                sprintf (sFile, "c:\\dev\\connect4\\connect4\\level%d_%d.txt", iLevel, iOffset);
                Output.open (sFile);
            }
            else
            {
                sprintf (sFile, "c:\\dev\\connect4\\connect4\\level%d.txt", iLevel);
                Output.open (sFile);
            }
            iFilesDone = 0;
    
            while (sFileList.next)
            {
                if (    (sFileList.next->next)
                     && (strcmp (sFileList.next->sCurrData, sFileList.next->next->sCurrData) == 0)
                   )
                {
                    iDup++;
                }
                else
                {
                    Output << sFileList.next->sCurrData << '\n';
                    iCurr++;
                }
                sFileList.next->Data >> sFileList.next->sCurrData;
                pTemp = sFileList.next;
    
                if (strlen (pTemp->sCurrData) < 20)
                {
                    sFileList.next = pTemp->next;
                    pTemp->Data.close();
    //                remove (pTemp->sFileName);
                    free (pTemp);
                    iFilesDone ++;
                    cout << "Completed file " << iFilesDone << " of " << iLast << ".\n";
                }
                else
                {
                    pFileList = pTemp;
    
                    while (    (pTemp->next)
                            && (strcmp (pFileList->sCurrData, pTemp->next->sCurrData) < 0)
                          )
                    {
                        pTemp = pTemp->next;
                    }
    
                    if (pFileList != pTemp)
                    {
                        sFileList.next = pFileList->next;
                        pFileList->next = pTemp->next;
                        pTemp->next = pFileList;
                    }
                }
    
    //            if (iCurr % 100000 == 0)
                {
                    cout << iCurr << " good. " << iDup << " duplicates.\n";
                }
            }
    
            Output.close();
        }
    }
    I originally was just using an array of structs to hold the file pointers. But I ran into a problem where the first pass worked fine, but every pass after failed to access the data.

    The current version shown above crashes with an access violation as soon as I try to open the first file.

    Am I just being stupid? Is it something simple like declaring the ifstream pointers inside the struct as pointers instead?

  6. #21
    Registered User
    Join Date
    Mar 2012
    Posts
    14
    MERGESIZE is currently defined as 3 for debug purposes. Once I get the test cases working, MERGESIZE will be set to 1000 and iOffset and iLevel will be passed in to the final function.

  7. #22
    Registered User
    Join Date
    Mar 2012
    Posts
    14
    Well, I still don't know what is wrong with that code, but I found a workaround.

    First, I tried changing the ifstream to a pointer, but that just caused an Access Violation to occur as soon as I tried to open a file.

    I couldn't seem to figure that one out either, so I then decided to try a different approach.

    Since the program worked for a single pass (before I switched it to using pointers), I took the for loop out and just made the function do a single pass. Then I put the for loop into a wrapper function which calls the file merging function.

    That seems to be working.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help in file Manipulation
    By arunvijay19 in forum C Programming
    Replies: 5
    Last Post: 02-07-2010, 05:23 AM
  2. File I/O Manipulation
    By mbh5m in forum C Programming
    Replies: 5
    Last Post: 05-31-2007, 08:11 AM
  3. i/o file manipulation
    By mouse163 in forum C++ Programming
    Replies: 4
    Last Post: 05-03-2003, 05:48 PM
  4. File manipulation
    By Shadow in forum C Programming
    Replies: 1
    Last Post: 04-23-2002, 08:07 AM
  5. need help with file manipulation
    By angelfly in forum C Programming
    Replies: 0
    Last Post: 09-21-2001, 01:26 PM