Thread: how to use malloc to account for large data files

  1. #1
    Registered User
    Join Date
    Jun 2017
    Posts
    7

    how to use malloc to account for large data files

    Hi, so I have a code to sort my data files by adding new lines when the value of the first column increases. however, when i crank the size of these files up, the code runs into a segmentation fault, thus suggesting that malloc() needs to be used to reallocate the memory. however, i am confused as the argument of the malloc() function is simply the size of memory required, which I am not sure. I also don't exactly know to which variable needs reallocating. any help here would be much appreciated!

    Code:
    #include <string.h>
    #include <math.h>
    #include <stdio.h>
    #define MAXLINE 100
     
    int main() 
    {
           printf("enter file name: ");
           char filename[MAXLINE];
           scanf("%s", filename);
     
           FILE *newfile = fopen(filename, "r");
           FILE *tempfile = fopen("tempfilename", "w");     
     
           int ch, nlines = 0;
           while ((ch = fgetc(newfile)) != EOF)
        {
                if (ch == '\n')
            {
                     nlines++;
            }
        }
     
            float doub[nlines];
            char line[MAXLINE], rest[nlines][MAXLINE];
     
            rewind(newfile);
      
            for (int i = 0; i < nlines; i++) 
        {
                fgets(line, MAXLINE, newfile);
                sscanf(line, "%f %s", &doub[i], rest[i]);
     
                if (i > 0 && doub[i] > doub[i-1])
                    {
                fputc('\n', tempfile);
            } 
                fputs(line, tempfile);
            }
     
            fclose(newfile);
            fclose(tempfile);
        rename("tempfilename", filename);
        return 0;
    }

  2. #2
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    One possible segfault in your code is a NULL newfile or tempfile variable. You need to check that your files actually opened.

    A more likely cause is that your arrays are too large to be held on the stack. One fix is to make them static (by adding that keyword before their definition). That way they are not allocated on the stack.

    But you don't actually need the arrays. You aren't using the rest array at all. You don't even need the doub array, since all you are using is the previous and current value. And since you don't need the arrays, you don't need to count the lines in the file first.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <limits.h>
    
    #define MAXLINE 1000
    #define TEMPFILENAME "tempfilename"
    
    int main() {
        printf("enter file name: ");
        char filename[MAXLINE];
        scanf("%s", filename);
    
        FILE *newfile = fopen(filename, "r");
        if (!newfile) { perror("fopen newfile"); exit(EXIT_FAILURE); }
        FILE *tempfile = fopen(TEMPFILENAME, "w");     
        if (!tempfile) { perror("fopen tempfile"); exit(EXIT_FAILURE); }
    
        float val, prev_val = INT_MAX;
        char line[MAXLINE];
        while (fgets(line, MAXLINE, newfile) != NULL) {
            sscanf(line, "%f", &val);
            if (val > prev_val)
                fputc('\n', tempfile);
            fputs(line, tempfile);
            prev_val = val;
        }
    
        fclose(newfile);
        fclose(tempfile);
    
        rename(TEMPFILENAME, filename);
    
        return 0;
    }
    Last edited by algorism; 07-03-2017 at 09:17 AM.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Indented for win.
    Code:
    #include <string.h>
    #include <math.h>
    #include <stdio.h>
    #define MAXLINE 100
    
    int main()
    {
      printf("enter file name: ");
      char filename[MAXLINE];
      scanf("%s", filename);
    
      FILE *newfile = fopen(filename, "r");
      FILE *tempfile = fopen("tempfilename", "w");
    
      int ch, nlines = 0;
      while ((ch = fgetc(newfile)) != EOF) {
        if (ch == '\n') {
          nlines++;
        }
      }
    
      float doub[nlines];
      char line[MAXLINE], rest[nlines][MAXLINE];
    
      rewind(newfile);
    
      for (int i = 0; i < nlines; i++) {
        fgets(line, MAXLINE, newfile);
        sscanf(line, "%f %s", &doub[i], rest[i]);
    
        if (i > 0 && doub[i] > doub[i - 1]) {
          fputc('\n', tempfile);
        }
        fputs(line, tempfile);
      }
    
      fclose(newfile);
      fclose(tempfile);
      rename("tempfilename", filename);
      return 0;
    }
    Make sure your code is as presentable as possible if you want people to care about looking at it.

    > float doub[nlines];
    > char line[MAXLINE], rest[nlines][MAXLINE];
    These are likely to be stored on the stack.
    Bear in mind that stack space is usually quite restricted on most machines - somewhere between 1MB and 8MB is very common.

    Code:
        sscanf(line, "%f %s", &doub[i], rest[i]);
    
        if (i > 0 && doub[i] > doub[i - 1]) {
          fputc('\n', tempfile);
        }
    1. You never use rest[i] at all, so why bother storing all that data?
    2. You're only ever interested in the previous value ( doub[i] > doub[i - 1] ), so you only need one double of memory, not a whole array.
    3. If your condition fails, you don't output a newline - how is this sorting data?

    Before you worry about using malloc for large files, how about making sure your code works for small files.


    Also posted here
    Code to sort a txt file line by line by order of increasing value
    And here! -> Code To Sort Data Into Ascending Order - C And C++ | Dream.In.Code
    Read this -> How To Ask Questions The Smart Way
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Large Data Files in FlatFile Database...
    By feeder74 in forum C++ Programming
    Replies: 4
    Last Post: 05-04-2010, 09:27 AM
  2. Help managing large amounts of data and generating data from it
    By jakethehake in forum C++ Programming
    Replies: 0
    Last Post: 12-01-2009, 06:59 AM
  3. [Large file][Value too large for defined data type]
    By salsan in forum Linux Programming
    Replies: 11
    Last Post: 02-05-2008, 04:18 AM
  4. Large arrays, malloc, and strcmp
    By k2712 in forum C Programming
    Replies: 1
    Last Post: 09-24-2007, 08:22 PM
  5. Reading large complicated data files
    By dodzy in forum C Programming
    Replies: 16
    Last Post: 05-17-2006, 04:57 PM

Tags for this Thread