Thread: Python --> C

  1. #61
    Registered User
    Join Date
    May 2010
    Posts
    24
    Quote Originally Posted by Adak View Post

    How does 3.2 sec's compare with Python's time on the larger file?
    It's not even worth waiting for it. It's well well beyond hard-boiled-egg-time.


    Code:
    As the data file grows, you'll need to move over to malloc'ing memory from the heap, because you can get substantially more memory, that way.
    I'll try if I can figure it out.

    Code:
    Where are all these tuples coming from?
    They are point numbers of polygons. If they share 1 point they are neighbours, if they share 2 points they share an edge and I can calculate the dihedral angle from their normals (if they are triplets )

  2. #62
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by Macha View Post
    It's not even worth waiting for it. It's well well beyond hard-boiled-egg-time.
    I'm sure one slow down is the mallocing of the memory in Python. When you start mallocing
    memory, in C, it will also slow things down a bit - not nearly as much as Python, but a little.

    Quote Originally Posted by Macha View Post
    They are point numbers of polygons. If they share 1 point they are neighbours, if they share 2 points they share an edge and I can calculate the dihedral angle from their normals (if they are triplets )
    Very interesting.

    Note to self: Don't let Macha lead you into deep water, without your life jacket!

  3. #63
    Registered User
    Join Date
    May 2010
    Posts
    24
    So...errr..... in that code a[R][C] is the array where R is fixed for the moment. Why can't we just do something like sizeof and then use that as the R value or am I missing something?

  4. #64
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Macha View Post
    So...errr..... in that code a[R][C] is the array where R is fixed for the moment. Why can't we just do something like sizeof and then use that as the R value or am I missing something?
    sizeof() what?

    If you know how big the array needs to be, just set the R to that. However, you will run into problems eventually because of stack size limitations imposed by the operating system. To get around that, you could make the array global.

    Otherwise you have to use malloc.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #65
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You mean take the size of the file, in bytes, and divide that by the number of bytes in each row of the data file, and use that for R?

    Knowing in advance what the largest possible R value would be for a stack based array, the program could decide to malloc memory, instead of using a stack based memory array, if it needed to.

    The bytes in each row would be ez: 3 * sizeof(int) on your system, + 2 * sizeof(char) for the spaces, plus 1 byte for the newline you don't see, on the end of each row.

    Add a little fudge factor to that number, and that should work.

    This is how Turbo C might get the size of a text file:

    Code:
    fseek   Repositions the file pointer of a stream.
    
     Syntax:
       int fseek(FILE *stream, long offset, int whence);
    
     Prototype in:
     stdio.h
    
     Remarks:
    fseek sets the file pointer associated with stream to a new position
    that is offset bytes from the file location given by whence.
    
    For text mode streams, offset should be 0 or a value returned by
    ftell.
    
    whence must be one of the values 0, 1, or 2, which represent three
    symbolic constants (defined in stdio.h) as follows:
    
       whence         ³ File location
    ====================================
       SEEK_SET   (0) ³ File beginning
       SEEK_CUR   (1) ³ Current file pointer position
       SEEK_END   (2) ³ End-of-file
    
    fseek discards any character pushed back using ungetc.
    
    fseek is used with stream I/O. For file handle I/O, use lseek.
    
    After fseek, the next operation on an update file can be either input
    or output.
    
     Return Value:
    fseek returns 0 if the pointer is successfully moved. It returns a
    nonzero on failure.
    
    fseek can return a zero, indicating that the pointer has been moved
    successfully, when in fact it has not been. This is because DOS,
    which actually resets the pointer, does not verify the setting.
    
    fseek returns an error code only on an unopened file or device.
    
     Portability:
    fseek is available on all UNIX systems and is defined in ANSI C.
    
     See Also:
      fgetpos      fsetpos    lseek     setbuf
      fopen        ftell      rewind    tell
    
     Example:
     #include <stdio.h>
    
     long filesize(FILE *stream);
    
     int main(void)
     {
        FILE *stream;
    
        stream = fopen("MYFILE.TXT", "w+");
        fprintf(stream, "This is a test");
        printf("Filesize of MYFILE.TXT is %ld bytes\n", filesize(stream));
        fclose(stream);
        return 0;
     }
    
     long filesize(FILE *stream)
     {
        long curpos, length;
    
        curpos = ftell(stream);
        fseek(stream, 0L, SEEK_END);
        length = ftell(stream);
        fseek(stream, curpos, SEEK_SET);
        return length;
     }
    Windows can give you this info as well, using it's API (I don't know the specifics).

  6. #66
    Registered User
    Join Date
    May 2010
    Posts
    24
    I'm trying to understand this malloc() thing.

    The code below seems to work but right at the end an error appears.

    For testing malloc on an array I created array1 like so:

    Code:
    int **array1 = malloc(r * sizeof(int *));
    and then in getTuples() there is a loop over every row in the file, so there I put:


    Code:
    array1[i] = malloc(C * sizeof(int));

    Full code:

    Code:
    /* testing a fast way to find all partial (2 of the 3 numbers) matches,
    within a large number of tuples.
    
    Because of the great locality of the data, no sorting/searching/indexing,
    etc., were found useful.
    
    Adak, May 26, 2010
    
    status: untested. 3042 matches found on Macha's 4.9sec.txt data file,
            renamed as "tuples.txt".
    
    my time: 0.05 seconds, on an Intel E6700 cpu, running at 2.66Ghz.
    */
    
    #include <stdio.h>
    #include <time.h>
    #include <stdlib.h>
    
    #define R 124000
    #define C 3
    
    int getTuples(FILE *fp, int a[R][C]);
    
    int main(void) {
    
    
      int i, j, r, c, maxrows, a[R][C] = {{0}};
      int found, match;
      clock_t start, stop;
      FILE *fp;
      printf("\n\n\n");
    
      //start=clock(); Overall program time
      if((fp=fopen("tuples.txt", "rt")) == NULL) {
        printf("\nError opening input file - exiting\n");
        return 1;
      }
    
      maxrows=getTuples(fp, a);
    
    if((fp=fopen("matches.txt", "wt")) == NULL) {
        printf("\nError opening output file - exiting\n");
        return 1;
      }
      //loaded up, ready to rock:
      start=clock();
    
      for(i=0,found=0;i<maxrows;i++) {
        for(j=i+1;j<maxrows;j++) {
          match=0;
          if(a[i][0]==a[j][0] || a[i][0]==a[j][1] || a[i][0]==a[j][2]) {match++;}
          if(a[i][1]==a[j][0] || a[i][1]==a[j][1] || a[i][1]==a[j][2]) {match++;}
          if(a[i][2]==a[j][0] || a[i][2]==a[j][1] || a[i][2]==a[j][2]) {match++;}
          if(match>1) {
            found++;
    /*
            writes result to file
            i,j are the triplet indices
    */
            fprintf(fp,"%d %d\n", i,j);
    /*
            printf("%d %d\n", i,j);
            printf("(%d %d %d)", a[i][0],a[i][1],a[i][2]);
            printf("(%d %d %d)\n\n", a[j][0],a[j][1],a[j][2]);
    */
          }
        }
      }
      printf("\n\n Partial matches found: %d", found);
      stop=clock();
      printf("\n\n elapsed time: %f", (stop-start)/(double)CLOCKS_PER_SEC);
      printf("\n\n\t\t\t    press enter when ready");
      i=getchar();
      fclose(fp);
      return 0;
    }
    
    
    int getTuples(FILE *fp, int a[R][C]) {
      int i, r, c;
      char z; //z="zee trash" char ;)
      c = 0;
      r=0;
      
      int **array1 = malloc(r * sizeof(int *));
    
    do{
    
        i= fscanf(fp, "%d %d %d ", &a[r][c],&a[r][c+1],&a[r][c+2]);
        if(i > 2){
            printf("\n%4d: %4d %4d %4d\n", r, a[r][c],a[r][c+1],a[r][c+2]);
            array1[i] = malloc(C * sizeof(int));
            array1[i][0] = a[r][0];
            array1[i][1] = a[r][1];
            array1[i][2] = a[r][2];
            printf("\narray1...%d   %d   %d\n\n",array1[i][0],array1[i][1],array1[i][2]);
            ++r;
         }
    }while((i > 2) && (r < R));
    
      printf("\n\n There are %d Rows of tuples\n", r);
    
      fclose(fp);
      return r;
    }

  7. #67
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Code:
      int i, r, c;
      char z; //z="zee trash" char ;)
      c = 0;
      r=0;
      
      int **array1 = malloc(r * sizeof(int *));
    You may want to pick a higher value than that . Also, while it's not that crucial in context, you should get in the habit of free()ing malloc'd memory -- this is where the term "memory leak" comes from.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #68
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    For each row, you will need to malloc:
    Code:
    3 * sizeof(int) + 2 * sizeof(char) + 1 sizeof(char)
    =========================================
     3 <one space>    2  <one space>    1 <one newline char>
    or 3 * sizeof(int) + 3 * sizeof(char)

    In getTuples(), you can delete the variable 'z', now. (not used).

    Note that when you free memory, free the rows in a loop, first, and then free the entire array, in one fell swoop. So it's just the reverse of how you malloc'ed the memory. If you free the array instead of the array rows first, you'll have no handle to locate the inner rows and free them - so a big memory leak.

    Your operating system should recover that memory anyway, at the end of the program. If you want to build more onto this program however, then you'll want to have that leaked memory back, (and you won't if it's been leaked).

  9. #69
    Registered User
    Join Date
    May 2010
    Posts
    24
    Maybe this is not the board to mention this, but today I tried the same algorithm in C# and that 3.2-second-in-C file took 20 seconds in C#. I am sure I did many nonos but still, does that sound about right?

    I think I found a potentially useful speedup for this code as well. I haven't tried it yet but every tuple should have no more intersecting tuples as it has elements. So, in the case of triplets just 3. As soon as we found 3 intersections we can break of the loop and continue with the next item in the list, potentially saving millions of tests.

  10. #70
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by Macha View Post
    Maybe this is not the board to mention this, but today I tried the same algorithm in C# and that 3.2-second-in-C file took 20 seconds in C#. I am sure I did many nonos but still, does that sound about right?

    I think I found a potentially useful speedup for this code as well. I haven't tried it yet but every tuple should have no more intersecting tuples as it has elements. So, in the case of triplets just 3. As soon as we found 3 intersections we can break of the loop and continue with the next item in the list, potentially saving millions of tests.
    If 1,3 is the same as 3, 1, then in the 1,2,3 tuple, the possible matches are:

    1,2
    1,3
    and
    2,3

    so that's a great idea if you know you have no duplicate tuples to deal with.

    Not a C# user, but it uses the dot net framework, which has to be somewhat slower. I view it as a way different beasty. More built in expressiveness, but less useful for the fastest run-times.

  11. #71
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Macha View Post
    Maybe this is not the board to mention this, but today I tried the same algorithm in C# and that 3.2-second-in-C file took 20 seconds in C#. I am sure I did many nonos but still, does that sound about right?
    Any sort of benchmark test I've ever seen puts C well ahead of C# speed wise, altho I don't know if it's that drastic. Of course you are right about the expense of "nonos". Eg. it seems to me that your original python has a somewhat serious inefficiency, altho I don't use python so I could be wrong:
    Code:
        r = range(len(myList))
        result = []
        for i in r:
            for j in r:
                if condition_shared(myList[i],myList[j],2):
                    result.append((i,j))
    Does that mean for each iteration of i we go through the entire list again? Notice that I, Salem, and Adak all used j = i+1 for the inner loop, because there is no point in comparing a triple to all the (previous) triples that have already been compared to the list.

    That one will amount to a big big difference on a large set. Trying setting Adak's code so that j=0 each time and it will take considerably longer, I think.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  12. #72
    Registered User
    Join Date
    May 2010
    Posts
    24
    That's right MK27. That was a mistake and has been corrected, both in the Python and C# code.

    Edit: My proposed for-loop-break reduces the C# speed from 20 secs to 15 secs with what seems to be the same results.
    Last edited by Macha; 05-28-2010 at 07:03 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 16
    Last Post: 11-01-2009, 12:10 AM
  2. python c api
    By ralu. in forum C Programming
    Replies: 0
    Last Post: 03-01-2009, 01:19 PM
  3. Python Embedding Help...
    By Rendered in forum C Programming
    Replies: 2
    Last Post: 11-17-2007, 10:08 AM
  4. python in c++ application
    By l2u in forum C++ Programming
    Replies: 5
    Last Post: 04-18-2007, 07:50 AM
  5. Python
    By Xterria in forum A Brief History of Cprogramming.com
    Replies: 10
    Last Post: 05-05-2002, 04:08 PM