Thread: Miximum matrix size? / reading big arrays of data

  1. #1
    Registered User
    Join Date
    Jul 2009
    Posts
    9

    Miximum matrix size? / reading big arrays of data

    Dear All,

    Please help me with my problem. In my program I need to open many files with numbers and read these numbers to the matrix. I am able to open only 22 files and when I try to open next one, I have a error message Segmentation fault. There is no problem in file. I create them in same way and if I double timestep I can pass the file where reading stops, but again stops after 22 files. Is there a limitation to the size of the matrix?
    I will be very grateful for any help.

    In ascii file I have first line with char symbol (I read it to the char variable). Then I have lines with double float type number (6 in one line). Below is an example.
    insert
    Code:
    C
    16.645432 7.813343 15.514420 
    16.665323 8.334960 15.687680 
    14.005162 7.725507 16.015217
    I create a matrix.

    incert
    Code:
     double CRx[N_molC][N_beadC][N_files];
      double CRy[N_molC][N_beadC][N_files];
      double CRz[N_molC][N_beadC][N_files];
    Then I read numbers to matrix.


    incert
    Code:
      t=0;
       while(t<N_files)
    {
    
    for( j=0; j < N_molC; j++)
         {
         
       for( i=0; i < N_beadC; i++)
         {       
           fscanf(fp,"%f",&tmpfloat1);
           fscanf(fp,"%f",&tmpfloat2);
           fscanf(fp,"%f",&tmpfloat3);
           fscanf(fp,"\n");
        
           CRx[j][i][t]=(double) tmpfloat1;
           CRy[j][i][t]=(double) tmpfloat2;
           CRz[j][i][t]=(double) tmpfloat3;
             
         }
       
        }
    fclose(fp);
    
      t=t+1;
    
     }

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    This is an operating system restriction, not C's.

    You may be able to set an environmental (global) variable in your OS to allow more files to open, but there will always be a limit.

    As a practical matter, I would also try to limit the number of files you have open, at any one time. Each file handle that is reserved by the OS, uses up a little bit of memory.

  3. #3
    pwning noobs Zlatko's Avatar
    Join Date
    Jun 2009
    Location
    The Great White North
    Posts
    132
    Elya, if you think the number of open files is the problem, then write a simple test to just open the files. That would rule out other causes of your seg fault. A good practice is to close files when they're no longer needed. Always check the result of your file open. If it fails, you can get diagnostic information from the errno global variable, or the perror, or strerror functions. I suspect that the seg fault is not from the size of your matrices or the number of open files, but rather from memory corruption or a file not found.

  4. #4
    Registered User
    Join Date
    Jul 2009
    Posts
    9
    Thanks a lot for your answers. I probably did not explain well my code. I do not open more then one file at the same time. I open files in a while loop and close after I have read everything I need.
    And I definitely have all data files and they are not corrupted. I have checked this.

    insert
    Code:
    t=0;
       while(t<N_files)
    {
    
    sprintf(systemname,"cldpd32_%d",t);
      
    sprintf(direc,".");
      
    fp=fopen(systemname,"r");
    
    
    for( j=0; j < N_molC; j++)
         {
         
       for( i=0; i < N_beadC; i++)
         {       
           fscanf(fp,"%f",&tmpfloat1);
           fscanf(fp,"%f",&tmpfloat2);
           fscanf(fp,"%f",&tmpfloat3);
           fscanf(fp,"\n");
        
           CRx[j][i][t]=(double) tmpfloat1;
           CRy[j][i][t]=(double) tmpfloat2;
           CRz[j][i][t]=(double) tmpfloat3;
             
         }
       
        }
    fclose(fp);
    
      t=t+1;
    
     }

  5. #5
    pwning noobs Zlatko's Avatar
    Join Date
    Jun 2009
    Location
    The Great White North
    Posts
    132
    I cannot see the problem from the code you've posted. Check that systemname is big enough to hold the file name, including the terminating null character. Check that fp is not NULL after the fopen. Check that your arrays are indeed big enough to hold your tmpfloats. Check that direc is at least 2 bytes long. It would help to post complete functions.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > double CRx[N_molC][N_beadC][N_files];
    Are these local arrays, or global?
    What values are N_molC etc?

    Have you tried working out (or printing) the value of say sizeof(CRx) ?

    > sprintf(systemname,"cldpd32_%d",t);
    > sprintf(direc,".");
    Are systemname and direc actually char arrays with enough space to store the strings you generate?


    For segfaults, we pretty much need a minimal COMPLETE program which demonstrates the problem. Posting the code where the problem happens isn't always enough. More often than not, the problem was originally caused by some earlier code and it's just the later code which blows up.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Jul 2009
    Posts
    9
    Dear all,

    Thanks for your remarks. Unfortunately I still can not solve my problem. I did not post all program as I did not wanted to bother you with non relevant details. It seems that I was wrong.
    Please find code below. When number of files is 24 and tmax is 240 everything is OK. Program works as it should - All numbers are read from files and written to matrix without problem. If I make N_files=25 and tmax=250 I get SEGMENTATION FAULT

    insert
    Code:
    #include <stdio.h>
    #include <math.h>
    
    static int N_molC=356;
    static int N_molPOL=356;
    static int N_beadPOL=16;
    static int N_beadC=4;
    static int N_files=24;
    static int tmax=240;
    
    main(void)
    {
      char filename[FILENAME_MAX];
      char systemname[FILENAME_MAX];
      char direc[FILENAME_MAX];
      int t,time;
       
      double CRx[N_molC][N_beadC][N_files];
      double CRy[N_molC][N_beadC][N_files];
      double CRz[N_molC][N_beadC][N_files];
      double CVx[N_molC][N_beadC][N_files];
      double CVy[N_molC][N_beadC][N_files];
      double CVz[N_molC][N_beadC][N_files];
    
      double POLRx[N_molPOL][N_beadPOL][N_files];
      double POLRy[N_molPOL][N_beadPOL][N_files];
      double POLRz[N_molPOL][N_beadPOL][N_files];
      double POLVx[N_molPOL][N_beadPOL][N_files];
      double POLVy[N_molPOL][N_beadPOL][N_files];
      double POLVz[N_molPOL][N_beadPOL][N_files];
      
      char molname;
      char molnamePOL1;
      char molnamePOL2;
      char molnamePOL3;
    
      int i,j,k,l,m;               
    
      FILE *fp,*fp1;
      float tmpfloat1, tmpfloat2, tmpfloat3, tmpfloat4, tmpfloat5, tmpfloat6;
      float tmpfloatP1, tmpfloatP2, tmpfloatP3, tmpfloatP4, tmpfloatP5, tmpfloatP6;
    
      
      time=0;
      t=0;
    
      while(t<N_files)
    
    {
    
      sprintf(systemname,"cldpd32_%d",time);
    
      sprintf(direc,".");
      
      fp=fopen(systemname,"r");
    
    
    //reading Crosslinker molecule C
      
      fscanf(fp,"%c", &molname);
    
      for( j=0; j < N_molC; j++)
         {
           
       for( i=0; i < N_beadC; i++)
         {       
           fscanf(fp,"%f",&tmpfloat1);
           fscanf(fp,"%f",&tmpfloat2);
           fscanf(fp,"%f",&tmpfloat3);
           fscanf(fp,"%f",&tmpfloat4);
           fscanf(fp,"%f",&tmpfloat5);
           fscanf(fp,"%f",&tmpfloat6);
           fscanf(fp,"\n");
                
           CRx[j][i][t]=(double) tmpfloat1;
           CRy[j][i][t]=(double) tmpfloat2;
           CRz[j][i][t]=(double) tmpfloat3;
           CVx[j][i][t]=(double) tmpfloat4;
           CVy[j][i][t]=(double) tmpfloat5;
           CVz[j][i][t]=(double) tmpfloat6;
      
    
         }
       
        }
      
    //reading polymer molecule POL
    
       fscanf(fp,"\n");
       fscanf(fp,"%c",&molnamePOL1);
       fscanf(fp,"%c",&molnamePOL2);
       fscanf(fp,"%c",&molnamePOL3);
       fscanf(fp,"\n");
    
       for( j=0; j < N_molPOL; j++)
         {
           
       for( i=0; i < N_beadPOL; i++)
         {       
           fscanf(fp,"%f",&tmpfloatP1);
           fscanf(fp,"%f",&tmpfloatP2);
           fscanf(fp,"%f",&tmpfloatP3);
           fscanf(fp,"%f",&tmpfloatP4);
           fscanf(fp,"%f",&tmpfloatP5);
           fscanf(fp,"%f",&tmpfloatP6);
           fscanf(fp,"\n");
        
           POLRx[j][i][t]=(double) tmpfloatP1;
           POLRy[j][i][t]=(double) tmpfloatP2;
           POLRz[j][i][t]=(double) tmpfloatP3;
           POLVx[j][i][t]=(double) tmpfloatP4;
           POLVy[j][i][t]=(double) tmpfloatP5;
           POLVz[j][i][t]=(double) tmpfloatP6;
    
              
         }
    
         }
    
    
      fclose(fp);
    
      time=time+10;
      t=t+1;
    
     }
    
    }
    The ASCII file see below. In this file coordinates and velocities of beads of type C and POL are written. There are first 1424 lines of C beads and then 5696 lines of POL beads.

    insert
    Code:
    C
    16.423653 8.173648 17.711168 1.228649 -0.611223 0.298042
    16.245525 8.398842 17.282671 -2.376466 0.008627 1.384713
    15.967139 8.242185 18.175253 -1.553412 1.382957 2.867683
    16.827858 7.964845 17.719322 -2.319486 -0.596886 -1.529252
    7.596681 11.735932 11.202373 0.756950 -1.676208 -0.480435
    ...............................................................
    
    POL
    19.387753 19.040670 4.157248 0.417328 -0.882778 -0.047030
    17.290615 0.016091 5.112263 1.836175 0.295644 0.420497
    17.561481 19.656910 4.736860 0.333807 -0.743066 -0.833660
    18.061928 0.007483 4.405063 0.716247 1.567799 0.496205
    18.464615 19.746244 4.112550 1.716511 -0.732745 0.883491
    ......................................................................

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Looks like you're making up too many 3D arrays, and running out of the memory you were allocated by the OS.

    Are you using Turbo or Borland C by any chance? If so, you need to update to a 32 bit compiler (at least), to allow more memory to be accessed by your C program. Gcc for linux, or MS Visual Express are two free 32 bit compilers.

    Find out how big your double data type is (in bytes) on your system, and then you can figure out how much memory you need, and whether your OS and PC hardware, can deliver it to your C compiler, to be used.

    Perhaps you could process some of the data, and then load up the second part of the data to be worked on.
    Last edited by Adak; 07-07-2009 at 05:08 AM.

  9. #9
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Your arrays just take two much space on the stack, far more than 1 or 2 megabytes which is the capacity of the stack. Not sure if making them global would help, or you may need to allocate them dynamically. Or do you need to load data from all files at the same time, rather than working with one file's data at a time?
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  10. #10
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    There are always limitations imposed by the system you run on concerning size of arrays. No way to avoid that.

    You can (almost) halve your memory usage by making your arrays of float, rather than of double. Since you're reading values as floats and then converting to double, you presumably don't need the extra precision. However, that just changes the size of arrays you can get before your code falls over.

    It is generally a good idea to check the return value from I/O functions (fopen, fscanf, fclose) just in case an error occurs. Similarly, any check on the values read you can do. Given that your code is playing with huge arrays, even a small error in one operation can have a large efect if it causes a buffer overrun or some other invalid operation.

    When creating your filenames (your monkeying around with systemname) make sure the array is of sufficient size to hold all data you might write to it.

    Other than that, try to redesign your program so you don't need so many arrays in memory at once. That might mean reading your file in in stages, or looping back and reading it again (depending on what you do with the data). That may affect program performance - depending on how your subsequent code does things, and whether you need to read the same data multiple times. But, as a rough rule, a program that runs slow but eventually produces the right results is often preferable to a program that runs out of memory.

  11. #11
    Registered User
    Join Date
    Jul 2009
    Posts
    9
    Quote Originally Posted by Adak View Post

    Are you using Turbo or Borland C by any chance? If so, you need to update to a 32 bit compiler (at least), to allow more memory to be accessed by your C program. Gcc for linux, or MS Visual Express are two free 32 bit compilers.
    I use gcc compiler for linux.

  12. #12
    pwning noobs Zlatko's Avatar
    Join Date
    Jun 2009
    Location
    The Great White North
    Posts
    132
    Hello Elya. Looks like you are hitting a system limit. You are asking for 85,440,000 bytes of stack space in your arrays when N_files = 250. You may be able to increase the stack space through some kernel parameter but really, for data of this size you should rethink your implementation. Do you absolutely need to have all the data in memory at one time?
    You might have more luck if you allocate your data arrays on the heap using malloc like this:

    double* CRx = (double*)malloc(N_molC * N_beadC * N_files * sizeof(double));

    Then to access your array, you need a bit more work to treat the mallocd array as your three dimensional array. Here is an example to show they are equivalent.

    Code:
    #include <stdio.h>
    #include <math.h>
    #include <malloc.h>
    #include <assert.h>
    
    static const int N_molC=2;
    static const int N_beadC=3;
    
    static const int N_files=4;
    
    #define CRx_VALUE(mol,bead,file) CRx[mol*N_files*N_beadC + bead*N_files + file]
    
    int main(void)
    {
    
    	double a[N_molC][N_beadC][N_files];
    	int i = 0;
    	for(int x = 0; x<N_molC; ++x)
    		for(int y = 0; y<N_beadC; ++y)
    			for(int z = 0; z<N_files; ++z)
    				a[x][y][z] = i++;
    
    
    
    	double* CRx = (double*)a;
    	for(int x = 0; x<N_molC; ++x)
    		for(int y = 0; y<N_beadC; ++y)
    			for(int z = 0; z<N_files; ++z)
    				if (a[x][y][z] != CRx_VALUE(x,y,z))
    				{
    					fprintf(stderr, "error\n");
    					assert(0);
    				}
    	return 0;
    }
    Instead of saying


    CRx[j][i][t]=(double) tmpfloat1;
    you'd say
    CRx_VALUE(j,i,t) =(double)tmpfloat1;

    By using the #define to access the element instead of a function call, you are essentially doing what the computer has to do anyway to determine the location of the element in a 3 dimensional array. I think there is no extra overhead.

    When you access such a large amount of data, you can expect paging. You should try to organize your data so that access is not all over the place in memory. For example, if you had a 3 dimensional array, and much of your code had the first index changing with the second and third constant, then you would be jumping across many elements between accesses. You should then reorganize your data so that its the last index that changes and the first two remain relatively constant. Then you'd be accessing consecutive memory locations and you could take advantage of CPU caching and less paging.


    EDIT:
    OK, this is embarassing, but earlier I gave you the wrong formula for the define. I have corrected it above.
    Last edited by Zlatko; 07-07-2009 at 08:03 AM.

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > double POLRx[N_molPOL][N_beadPOL][N_files];
    Each one of these is a megabyte, and you've got 6 of these.
    Then the others add a huge block as well.

    Bear in mind that the default stack allocation is 1MB, and you've blown that away before you start.

    Here's a quick fix!
    Code:
    static double POLRx[N_molPOL][N_beadPOL][N_files];
    Making all your large arrays static takes them off the stack, and into program data segment (where you've got a lot more memory).
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #14
    pwning noobs Zlatko's Avatar
    Join Date
    Jun 2009
    Location
    The Great White North
    Posts
    132
    Quote Originally Posted by Salem View Post
    > double POLRx[N_molPOL][N_beadPOL][N_files];
    Here's a quick fix!
    Code:
    static double POLRx[N_molPOL][N_beadPOL][N_files];
    Making all your large arrays static takes them off the stack, and into program data segment (where you've got a lot more memory).
    Thats actually much nicer than allocating on the heap, as long as your array dimensions stay constant.

  15. #15
    Registered User
    Join Date
    Jul 2009
    Posts
    9
    Dear All,

    Thanks a lot for your advises. I made matrix static.
    insert
    Code:
    static double POLRx[N_molPOL][N_beadPOL][N_files];
    ......
    but I have got an error message.

    insert
    Code:
    mean_square_forum.c:10: error: variably modified ‘CRx’ at file scope
    mean_square_forum.c:11: error: variably modified ‘CRy’ at file scope
    .........................
    Is this actually allowed by C? I've read that such matrix declaration is permitted only in C++. Is this true?

    Thanks
    Elya

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 48
    Last Post: 09-26-2008, 03:45 AM
  2. Bitmasking Problem
    By mike_g in forum C++ Programming
    Replies: 13
    Last Post: 11-08-2007, 12:24 AM
  3. Weird errors.
    By Desolation in forum C++ Programming
    Replies: 20
    Last Post: 05-09-2007, 01:10 PM
  4. Trouble with DMA Segmentation Faults
    By firestorm717 in forum C Programming
    Replies: 2
    Last Post: 05-07-2006, 09:20 PM