Thread: Line Counting

  1. #16
    Registered User
    Join Date
    Mar 2005
    Posts
    135
    This little thread got me thinking and, decided that I should make a solution. You're welcome =):

    Code:
    // Include these preprocessor directives
    // before function definition (usually
    // at top of file before main(...);
    
    #if EOF != -2
    #define BOF -2
    #else BOF -1
    #endif
    
    /// Prototype declaration
    int GetFileLineAmount(FILE*);
    
    // ************************************
    // @- definition:
    // int GetFileLineAmount(FILE *fp);
    // ************************************
    // This function gets the number of
    // lines (one or more characters),
    // independent of the existance of
    // the newline character ('\n').
    // ************************************
    
    int GetFileLineAmount(FILE *fp)
    {
    	int ch, prev, cnt;
    
    	for ( cnt = 0, prev = BOF; (ch = fgetc(fp)) != EOF; prev = ch )
    		if ( ch == '\n' )
    			if ( (prev != '\n') && (prev != BOF) )
    				cnt++;
    
    	if ( (prev != BOF) && (prev != '\n') )
    		cnt++;
    
    	return cnt;
    }
    - xeddiex

  2. #17
    Registered User
    Join Date
    Jul 2003
    Posts
    110
    Well alrighty then!

    It seems like we need a code-contest to resolve this, so I did one. I used my proposal, Waltp's and xeddiex's (Waltp's seemed to cover all of the fgets cases) and a really, really huge file (the actors.list from the IMDb Internet Movie Database).

    Here are the three contestants:
    Code:
    #if EOF != -2
    #define BOF -2
    #else
    #define BOF -1
    #endif
    
    /* .......................................... */
    int GetFileLineAmount(FILE *fp)
    {
    	int ch, prev, cnt;
    
    	for ( cnt = 0, prev = BOF; (ch = fgetc(fp)) != EOF; prev = ch )
    		if ( ch == '\n' )
    			if ( (prev != '\n') && (prev != BOF) )
    				cnt++;
    
    	if ( (prev != BOF) && (prev != '\n') )
    		cnt++;
    
    	return cnt;
    }
    
    /* .......................................... */
    int waltp(FILE *fp) {
    	int nl = 0;
    	char buf[BUFSIZ];
    	while ( fgets(buf, sizeof buf, fp) )
    		if ( strchr(buf, '\n') ) ++nl;
    	if ( !strchr(buf, '\n') ) ++nl;
    	return nl;
    }
    
    /* .......................................... */
    int whoie(FILE *fp) {
    	int c;
    	int nl = 0;
    	while ( (c = getc(fp)) != EOF ) {
    		++nl;
    		if ( c != '\n' ) {
    			fscanf(fp, "%*[^\n]");
    			getc(fp);
    		}
    	}
    	return nl;
    }
    and a little test driver:
    Code:
    int main(void) {
    	FILE *fp = fopen("actors.list", "r");
    	if ( !fp ) exit(EXIT_FAILURE);
    
    	printf("xeddiex found %d line(s).\n", GetFileLineAmount(fp));
    	rewind(fp);
    	printf("waltp found %d line(s).\n", waltp(fp));
    	rewind(fp);
    	printf("whoie found %d line(s).\n", whoie(fp));	
    
    	return 0;
    }
    and the result of the trial run:
    Code:
    whoie@linux:~/database> gcc -O3 -pg *.c -o lc
    whoie@linux:~/database> ./lc
    xeddiex found 2087613 line(s).
    waltp found 2552802 line(s).
    whoie found 2552802 line(s).
    whoie@linux:~/database> wc -l actors.list
    2552802 actors.list
    whoie@linux:~/database> ls -l actors.list
    -rw-r--r--  1 whoie users 104248367 2006-03-04 17:25 actors.list
    whoie@linux:~/database> gprof -b lc >graph
    xeddiex you need to debug a bit more! waltp and whoie seem to agree with wc about the number of lines in the file. So waltp and whoie pass one (which is not to say "all") test for correctness.

    and finally, the timing profile of each proposal:
    Code:
    Flat profile:
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls   s/call   s/call  name    
     84.40      3.30     3.30        1     3.30     3.30  GetFileLineAmount
     12.79      3.80     0.50        1     0.50     0.50  waltp
      2.81      3.91     0.11        1     0.11     0.11  whoie
    This is why I was suggesting not using fgets, because it requires two passes over the data, and the [f]scanf version only uses one so it should be about twice as fast as fgets. Although, the disk I/O totally dominates the overall timing (as expected), so that's really just interesting from a scientific point of view.

    This is all just for fun. If I have done something unfair, or if one of the contestants feels that I have unfairly represented them here, let me know. Or post your own results! I would like to see other comparisons!

    Will

  3. #18
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Code:
    $ gcc -W -Wall -ansi -pedantic -O2 foo.c
    $ ./a.out empty.txt nonewline.txt oneline.txt longline.txt foo.c missing.txt
    0 lines in file empty.txt
    1 lines in file nonewline.txt
    1 lines in file oneline.txt
    1 lines in file longline.txt
    31 lines in file foo.c
    Unable to open file missing.txt
    $ wc empty.txt nonewline.txt oneline.txt longline.txt foo.c
      0   0   0 empty.txt
      0   2  10 nonewline.txt
      1   2   9 oneline.txt
      1   8  35 longline.txt
     31 119 691 foo.c
     33 131 745 total
    And the code
    Code:
    #include <stdio.h>
    #include <string.h>
    
    void countlines ( char *filename ) {
      FILE *fp = fopen( filename, "r" );
      if ( fp != NULL ) {
        char buff[10];  /* small buff to show 'long' line handling */
        int nlines = 0;
        int partial = 0;
        while ( fgets( buff, sizeof(buff), fp ) != NULL ) {
          if ( strchr(buff,'\n') ) {
            nlines++;
            partial = 0;
          } else {
            partial = 1;
          }
        }
        nlines += partial;
        printf( "%d lines in file %s\n", nlines, filename );
        fclose( fp );
      } else {
        printf("Unable to open file %s\n", filename );
      }
    }
    
    int main( int argc, char *argv[] ) {
      while ( --argc ) {
        countlines( *++argv );
      }
      return 0;
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #19
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    a generic programming approach:

    Code:
    #include <stdio.h>
    #include <string.h>
    
    size_t
    count(const char * str, const char * fnd)
    {
    	size_t 
    		cnt = 0,
    		len = strlen(fnd);
    		
    	while((str = strstr(str, fnd)) != 0)
    	{
    		str += len;
    		++cnt;
    	}
    	
    	return cnt;
    }
    
    size_t
    file_count(const char * inp, const char * fnd)
    {
    	size_t cnt = 0;
    	
    	const size_t max = 1024;
    
    	char buf[max + 1];
    
    	const char * aok = 0;	
    		
    	FILE * inf = fopen(inp, "r");
    	
    	if(inf)
    	{
    		aok = fgets(buf, max, inf);
    		
    		if(aok && *aok)
    		{
    			cnt = count(buf, fnd);
    			
    			while(fgets(buf, max, inf) != 0)
    			{
    				cnt += count(buf, fnd);
    			}
    		}
    		else
    		{
    			aok = 0;
    		}
    		
    		fclose(inf);
    	}
    	
    	return aok ? cnt : (size_t)-1;
    }
    
    size_t
    file_count_lines(const char * inp)
    {
    	return file_count(inp, "\n") + 1;
    }
    
    int
    main(int argc, char ** argv)
    {
    	while(*(++argv))
    	{
    		printf("'%s': %d\n", *argv, file_count_lines(*argv));
    	}
    }
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  5. #20
    Been here, done that.
    Join Date
    May 2003
    Posts
    1,164
    Quote Originally Posted by whoie
    Code:
    Flat profile:
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls   s/call   s/call  name    
     84.40      3.30     3.30        1     3.30     3.30  GetFileLineAmount
     12.79      3.80     0.50        1     0.50     0.50  waltp
      2.81      3.91     0.11        1     0.11     0.11  whoie
    This is why I was suggesting not using fgets, because it requires two passes over the data, and the [f]scanf version only uses one so it should be about twice as fast as fgets. Although, the disk I/O totally dominates the overall timing (as expected), so that's really just interesting from a scientific point of view.
    Hey Will, Try this code in your profile:
    Code:
    int waltp2(FILE *fp) {
        int nl = 0;
        int c;
        char buf[BUFSIZ];
        while ( fgets(buf, sizeof buf, fp) )
        {
            c = strlen(buf)-1;
            if ( buf[c] == '\n' ) ++nl;
        }
        if ( buf[c] != '\n' ) ++nl;
        return nl;
    }
    
    /* .......................................... */
    int waltp3(FILE *fp) {
        int nl = 0;
        int buf;
        int c;
        while ( (buf = fgetc(fp)) != EOF )
        {
            if ( buf == '\n' ) ++nl;
            c = buf;
        }
        if ( c != '\n' ) ++nl;
        return nl;
    }
    Not sure what this will show, but I didn't like the strchr() because it's probably slower than strlen(), although I've never tested it.

    In the rough test I performed I just used the time() function on a 3.5 M line file and got in seconds:
    15 (xeddiex)
    13 (waltp)
    13 (waltp2)
    15 (waltp3)
    20 (whoie)
    This pretty much shows strchr() and strlen() are fairly equivalent. That surprised me.

    whoie, xeddiex, and waltp3 all got the correct number of lines
    waltp, waltp1 both missed 1 line.

    On a short file (20 lines) all were correct, with and without a final \n. Don't know what's with the big file...


    The number of lines:
    Definition: Politics -- Latin, from
    poly meaning many and
    tics meaning blood sucking parasites
    -- Tom Smothers

  6. #21
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    I'm about a day late and a dollar short, but I might choose something like this, slightly tweaked:
    Code:
    int mycountlines(FILE *file)
    {
       int lines = 0;
       int ch, prev = '\n' /* so empty files have no lines */;
       while ( (ch = getc(file)) != EOF ) /* Read all chars in the file. */
       {
          if ( ch == '\n' )
          {
             ++lines; /* Bump the counter for every newline. */
          }
          prev = ch; /* Keep a copy to later test whether... */
       }
       if ( prev != '\n' ) /* ...the last line did not end in a newline. */
       {
          ++lines; /* If so, add one more to the total. */
       }
       return lines;
    }
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  7. #22
    Registered User
    Join Date
    Mar 2005
    Posts
    135
    Quote Originally Posted by whoie
    Well alrighty then!
    Will
    It's nice to see the test results - speed-wise, but, I'm afraid this wasn't about speed. My primary goal from the begining was to count the *actual* amount of non-newline-empty lines, but lines consisting of one or more characters. That is why mine came out differently.

    For example, the following file conatins a series of strings (lines) with arbitrary empty lines in between the file, at the begining, and the end:

    The empty lines which contain newlines ('\n') are denoted by *, and the end of the file is denoted by the constant symbol EOF:

    Code:
    FILE: winter.txt
    -------------------
    *
    one
    *
    two
    *
    three
    four
    five
    *
    six
    *
    seven
    eight
    *
    *
    *
    *
    *EOF
    ------------------
    As you can see, the *actual* amount of (useful) lines in the file are 8. And this is what my program counted, for waltp, whoie, salem, and Dave_Sinkula, they're programs counted 17 lines. What I understood from the OP (and adding a little bit of common sense) is that, one would only need to, or want to read in lines that are non-empty and, not the ones denoted by * (empty lines with newlines). And This was precisely my primary goal; to create a pure newline-proof algorithm.

    Thanks for the speed test results, though,
    - xeddiex

  8. #23
    Registered User
    Join Date
    Jul 2003
    Posts
    110
    xeddiex: Fair enough. I've excluded yours now since your specification is different.

    For the other "contestants", I mashed a couple of them slightly to fit them into my test driver. If I screwed up somewhere, please post corrections. I had to fix a couple implementation bugs that caused seg faults or compilation failures, so look closely.

    The correctness tests (using Salem's idea) involved an empty file, one with a newline after the last line, one without, one with one very big line, and the two IMDb files for crazy characters and huge input. Funny enough, wc pukes on the IMDb files because of those crazy characters, so I had to count newlines only. Here is the wc output:
    Code:
    whoie@linux:~/database> wc *.txt *.c
       0    0    0 empty.txt
       1    6 1785 longline.txt
       0    2   10 nonewline.txt
       1    2    9 oneline.txt
     138  442 2440 lc.c
      32  104  917 main.c
     172  556 5161 total
    whoie@linux:~/database> wc -l *.list
      2552802 actors.list
      1328278 actresses.list
      3881080 total
    notice that wc doesn't count lines according to our working line definition. The file without a newline after the last line yields zero.

    Here are the contestants:
    Code:
    #include <stdio.h>
    #include <string.h>
    
    /* .......................................... */
    int waltp(FILE *fp) {
    	int nl = 0;
    	char buf[BUFSIZ] = "";
    	while ( fgets(buf, sizeof buf, fp) )
    		if ( strchr(buf, '\n') ) ++nl;
    	if ( !strchr(buf, '\n') ) ++nl;
    	return nl;
    }
    
    /* .......................................... */
    int waltp2(FILE *fp) {
    	int nl = 0;
    	int c = 0;
    	char buf[BUFSIZ] = "";
    	while ( fgets(buf, sizeof buf, fp) )
    	{
    		c = strlen(buf)-1;
    		if ( buf[c] == '\n' ) ++nl;
    	}
    	if ( buf[c] != '\n' ) ++nl;
    	return nl;
    }
    
    /* .......................................... */
    int waltp3(FILE *fp) {
    	int nl = 0;
    	int buf;
    	int c;
    	while ( (buf = fgetc(fp)) != EOF )
    	{
    		if ( buf == '\n' ) ++nl;
    		c = buf;
    	}
    	if ( c != '\n' ) ++nl;
    	return nl;
    }
    
    /* .......................................... */
    int whoie(FILE *fp) {
    	int c;
    	int nl = 0;
    	while ( (c = getc(fp)) != EOF ) {
    		++nl;
    		if ( c != '\n' ) {
    			fscanf(fp, "%*[^\n]");
    			getc(fp);
    		}
    	}
    	return nl;
    }
    
    /* .......................................... */
    int Salem ( FILE *fp ) {
    	int nlines = 0;
    	int partial = 0;
    	char buff[BUFSIZ] = "";
    	while ( fgets( buff, sizeof(buff), fp ) != NULL ) {
    		if ( strchr(buff,'\n') ) {
    			nlines++;
    			partial = 0;
    		} else {
    			partial = 1;
    		}
    	}
    	return nlines += partial;
    }
    
    /* .......................................... */
    int Dave_Sinkula(FILE *file)
    {
    	int lines = 0;
    	int ch, prev = '\n' /* so empty files have no lines */;
    	while ( (ch = getc(file)) != EOF ) /* Read all chars in the file. */
    	{
    		if ( ch == '\n' )
    		{
    			++lines; /* Bump the counter for every newline. */
    		}
    		prev = ch; /* Keep a copy to later test whether... */
    	}
    	if ( prev != '\n' ) /* ...the last line did not end in a newline. */
    	{
    		++lines; /* If so, add one more to the total. */
    	}
    	return lines;
    } 
    
    /* .......................................... */
    size_t
    count(const char * str, const char * fnd)
    {
    	size_t cnt = 0, len = strlen(fnd);
    		
    	while((str = strstr(str, fnd)) != 0)
    	{
    		str += len;
    		++cnt;
    	}
    	
    	return cnt;
    }
    
    #define MAX 1024
    size_t
    file_count(FILE *inf, const char * fnd)
    {
    	size_t cnt = 0;
    	char buf[MAX + 1];
    	const char * aok = 0;	
    	
    	if(inf)
    	{
    		aok = fgets(buf, MAX, inf);
    		if(aok && *aok)
    		{
    			cnt = count(buf, fnd);	
    			while(fgets(buf, MAX, inf) != 0)
    			{
    				cnt += count(buf, fnd);
    			}
    		}
    		else
    		{
    			aok = 0;
    		}
    	}
    	
    	return aok ? cnt : (size_t)-1;
    }
    
    int Sebastiani(FILE *fp)
    {
    	return file_count(fp, "\n") + 1;
    }
    and the test driver:
    Code:
    #include <stdio.h>
    
    int whoie(FILE*), waltp(FILE*), waltp2(FILE*)
    	,waltp3(FILE*), Salem(FILE*), Sebastiani(FILE*), Dave_Sinkula(FILE*);
    
    int main(int argc, char *argv[]) {
    	FILE *fp = 0;
    	while ( --argc > 0 && (fp = fopen(*++argv, "r")) != 0 ) {
    		printf("File: %s\n", *argv);
    		printf("waltp found %d line(s).\n", waltp(fp));
    		rewind(fp);
    		printf("waltp2 found %d line(s).\n", waltp2(fp));
    		rewind(fp);
    		printf("waltp3 found %d line(s).\n", waltp3(fp));
    		rewind(fp);
    		printf("whoie found %d line(s).\n", whoie(fp));
    		rewind(fp);
    		printf("Dave_Sinkula found %d line(s).\n", Dave_Sinkula(fp));
    		rewind(fp);
    		printf("Sebastiani found %d line(s).\n", Sebastiani(fp));
    		rewind(fp);
    		printf("Salem found %d line(s).\n\n", Salem(fp));
    		fclose(fp);
    	}
    	if ( argc != 0 ) printf("Unable to open %s\n", *argv);
    
    	return 0;
    }
    Here are the correctness test results:
    Code:
    whoie@linux:~/database> gcc -W -Wall -ansi -pedantic -c *.c
    whoie@linux:~/database> gcc -O3 -pg *.c -o lc
    whoie@linux:~/database> ./lc *.txt *.c *.list
    File: empty.txt
    waltp found 1 line(s).
    waltp2 found 1 line(s).
    waltp3 found 1 line(s).
    whoie found 0 line(s).
    Dave_Sinkula found 0 line(s).
    Sebastiani found 0 line(s).
    Salem found 0 line(s).
    
    File: longline.txt
    waltp found 1 line(s).
    waltp2 found 1 line(s).
    waltp3 found 1 line(s).
    whoie found 1 line(s).
    Dave_Sinkula found 1 line(s).
    Sebastiani found 2 line(s).
    Salem found 1 line(s).
    
    File: nonewline.txt
    waltp found 1 line(s).
    waltp2 found 1 line(s).
    waltp3 found 1 line(s).
    whoie found 1 line(s).
    Dave_Sinkula found 1 line(s).
    Sebastiani found 1 line(s).
    Salem found 1 line(s).
    
    File: oneline.txt
    waltp found 1 line(s).
    waltp2 found 1 line(s).
    waltp3 found 1 line(s).
    whoie found 1 line(s).
    Dave_Sinkula found 1 line(s).
    Sebastiani found 2 line(s).
    Salem found 1 line(s).
    
    File: lc.c
    waltp found 138 line(s).
    waltp2 found 138 line(s).
    waltp3 found 138 line(s).
    whoie found 138 line(s).
    Dave_Sinkula found 138 line(s).
    Sebastiani found 139 line(s).
    Salem found 138 line(s).
    
    File: main.c
    waltp found 32 line(s).
    waltp2 found 32 line(s).
    waltp3 found 32 line(s).
    whoie found 32 line(s).
    Dave_Sinkula found 32 line(s).
    Sebastiani found 33 line(s).
    Salem found 32 line(s).
    
    File: actors.list
    waltp found 2552802 line(s).
    waltp2 found 2552802 line(s).
    waltp3 found 2552802 line(s).
    whoie found 2552802 line(s).
    Dave_Sinkula found 2552802 line(s).
    Sebastiani found 2552803 line(s).
    Salem found 2552802 line(s).
    
    File: actresses.list
    waltp found 1328278 line(s).
    waltp2 found 1328278 line(s).
    waltp3 found 1328278 line(s).
    whoie found 1328278 line(s).
    Dave_Sinkula found 1328278 line(s).
    Sebastiani found 1328279 line(s).
    Salem found 1328278 line(s).
    Finally, just to show the relative importance (or unimportance) of speed in this, here is a flat profile of all contestants plus the standard library:
    Code:
    Flat profile:
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls   s/call   s/call  name    
     22.74     54.99    54.99 318576683     0.00     0.00  getc
     10.83     81.18    26.20  3135302     0.00     0.00  _IO_vfscanf
      6.61     97.16    15.98        8     2.00     6.92  Dave_Sinkula
      6.36    112.55    15.39                             __getclktck
      6.20    127.54    14.99 337237166     0.00     0.00  flockfile
      5.93    141.87    14.33        8     1.79     6.71  waltp3
      5.41    154.95    13.08                             __profile_frequency
      4.91    166.84    11.89 15669489     0.00     0.00  memchr
      4.20    177.00    10.16 15669489     0.00     0.00  memcpy
      3.78    186.13     9.13   266321     0.00     0.00  read
      3.77    195.25     9.12 337237166     0.00     0.00  funlockfile
      3.63    204.02     8.77  7762514     0.00     0.00  index
      3.04    211.38     7.36  3135302     0.00     0.00  vfscanf
      2.31    216.97     5.59 15525045     0.00     0.00  fgets
      2.21    222.30     5.34 15525045     0.00     0.00  _IO_getline_info
      2.02    227.19     4.89  7762514     0.00     0.00  strstr
      1.51    230.83     3.64  3135326     0.00     0.00  memset
      1.36    234.11     3.29                             _IO_wfile_overflow
      0.95    236.41     2.30        8     0.29     1.58  waltp2
      0.74    238.19     1.78 15525045     0.00     0.00  _IO_getline
      0.24    238.76     0.57        8     0.07     1.91  waltp
      0.21    239.26     0.50        8     0.06     2.01  Sebastiani
      0.15    239.63     0.37  3881270     0.00     0.00  strlen
      0.14    239.97     0.34        8     0.04     1.88  Salem
      0.13    240.27     0.31  3135301     0.00     0.00  _IO_sputbackc
      0.12    240.55     0.28        8     0.04     5.17  whoie
      0.11    240.81     0.26  3135302     0.00     0.00  fscanf
      0.10    241.04     0.23        8     0.03     0.03  fclose
      0.06    241.19     0.15                             mempcpy
      0.06    241.34     0.15        8     0.02     0.02  fopen
      0.04    241.44     0.11   266321     0.00     0.00  _IO_file_underflow
      0.04    241.55     0.11                             __default_morecore
      0.03    241.62     0.08   266321     0.00     0.00  _IO_file_read
      0.02    241.66     0.04   266321     0.00     0.00  __uflow
      0.02    241.70     0.04       64     0.00     0.00  printf
      0.01    241.74     0.04                             dprintf
      0.01    241.77     0.03   266321     0.00     0.00  _IO_switch_to_get_mode
      0.01    241.79     0.03                             _IO_default_seekoff
      0.01    241.81     0.02   266321     0.00     0.00  _IO_default_uflow
      0.00    241.82     0.01       48     0.00     0.00  llseek
      0.00    241.83     0.01                             __libc_register_dlfcn_hook
      0.00    241.84     0.01                             memmove
      0.00    241.84     0.01       71     0.00     0.00  _IO_file_overflow
      0.00    241.85     0.01        9     0.00     0.00  _IO_file_stat
      0.00    241.85     0.01                             _IO_default_underflow
      0.00    241.85     0.00      192     0.00     0.00  _IO_new_file_xsputn
      0.00    241.85     0.00      128     0.00     0.00  __find_specmb
      0.00    241.85     0.00       74     0.00     0.00  __errno_location
      0.00    241.85     0.00       65     0.00     0.00  _IO_new_do_write
      0.00    241.85     0.00       64     0.00     0.00  _IO_file_write
      0.00    241.85     0.00       64     0.00     0.00  new_do_write
      0.00    241.85     0.00       64     0.00     0.00  vfprintf
      0.00    241.85     0.00       64     0.00     0.00  write
      0.00    241.85     0.00       56     0.00     0.00  _itoa_word
      0.00    241.85     0.00       48     0.00     0.00  _IO_file_seek
      0.00    241.85     0.00       48     0.00     0.00  _IO_file_seekoff
      0.00    241.85     0.00       48     0.00     0.00  _IO_seekoff_unlocked
      0.00    241.85     0.00       48     0.00     0.00  rewind
      0.00    241.85     0.00       24     0.00     0.00  _IO_un_link
      0.00    241.85     0.00       17     0.00     0.00  _IO_setb
      0.00    241.85     0.00       16     0.00     0.00  _IO_link_in
      0.00    241.85     0.00       10     0.00     0.00  _int_malloc
      0.00    241.85     0.00       10     0.00     0.00  malloc
      0.00    241.85     0.00        9     0.00     0.00  _IO_doallocbuf
      0.00    241.85     0.00        9     0.00     0.00  _IO_file_doallocate
      0.00    241.85     0.00        9     0.00     0.00  ___fxstat64
      0.00    241.85     0.00        9     0.00     0.00  mmap
      0.00    241.85     0.00        8     0.00     0.00  _IO_default_finish
      0.00    241.85     0.00        8     0.00     0.00  _IO_file_close
      0.00    241.85     0.00        8     0.00     0.00  _IO_file_finish
      0.00    241.85     0.00        8     0.00     0.00  _IO_file_open
      0.00    241.85     0.00        8     0.00     0.00  _IO_new_file_close_it
      0.00    241.85     0.00        8     0.00     0.00  _IO_new_file_fopen
      0.00    241.85     0.00        8     0.00     0.00  _IO_new_file_init
      0.00    241.85     0.00        8     0.00     0.00  _IO_no_init
      0.00    241.85     0.00        8     0.00     0.00  _IO_old_init
      0.00    241.85     0.00        8     0.00     0.00  _IO_unsave_markers
      0.00    241.85     0.00        8     0.00     0.00  __fopen_internal
      0.00    241.85     0.00        8     0.00     0.00  __fopen_maybe_mmap
      0.00    241.85     0.00        8     0.00     0.00  _int_free
      0.00    241.85     0.00        8     0.00     0.00  cfree
      0.00    241.85     0.00        8     0.00     0.00  getenv
      0.00    241.85     0.00        8     0.00     0.00  munmap
      0.00    241.85     0.00        8     0.00     0.00  open
      0.00    241.85     0.00        8     0.00     0.00  sYSTRIm
      0.00    241.85     0.00        8     0.00     0.00  strncmp
      0.00    241.85     0.00        3     0.00     0.00  __cxa_atexit
      0.00    241.85     0.00        3     0.00     0.00  __new_exitfn
      0.00    241.85     0.00        1     0.00     0.00  _IO_default_xsputn
      0.00    241.85     0.00        1     0.00     0.00  __init_misc
      0.00    241.85     0.00        1     0.00     0.00  __libc_csu_fini
      0.00    241.85     0.00        1     0.00     0.00  __libc_csu_init
      0.00    241.85     0.00        1     0.00     0.00  __libc_init_first
      0.00    241.85     0.00        1     0.00     0.00  __libc_init_secure
      0.00    241.85     0.00        1     0.00     0.00  __libc_setup_tls
      0.00    241.85     0.00        1     0.00     0.00  __libc_sigaction
      0.00    241.85     0.00        1     0.00   209.76  __libc_start_main
      0.00    241.85     0.00        1     0.00     0.00  __pthread_initialize_minimal
      0.00    241.85     0.00        1     0.00     0.00  __setfpucw
      0.00    241.85     0.00        1     0.00     0.00  _dl_aux_init
      0.00    241.85     0.00        1     0.00     0.00  _dl_important_hwcaps
      0.00    241.85     0.00        1     0.00     0.00  _dl_init_paths
      0.00    241.85     0.00        1     0.00     0.00  _dl_non_dynamic_init
      0.00    241.85     0.00        1     0.00     0.00  _mcleanup
      0.00    241.85     0.00        1     0.00     0.00  atexit
      0.00    241.85     0.00        1     0.00     0.00  exit
      0.00    241.85     0.00        1     0.00     0.00  fini
      0.00    241.85     0.00        1     0.00     0.00  init
      0.00    241.85     0.00        1     0.00   209.76  main
      0.00    241.85     0.00        1     0.00     0.00  moncontrol
      0.00    241.85     0.00        1     0.00     0.00  setitimer
      0.00    241.85     0.00        1     0.00     0.00  sigaction
      0.00    241.85     0.00        1     0.00     0.00  strrchr
      0.00    241.85     0.00        1     0.00     0.00  uname
    Look closely at the entry for my solution:
    Code:
      %   cumulative   self              self     total           
     time   seconds   seconds    calls   s/call   s/call  name    
      0.12    240.55     0.28        8     0.04     5.17  whoie
    Notice that the time spent in my function alone is the least (for this run), but that the fscanf call totally overwhelms it! The number of seconds spent total is 5.17, and looking at the scanf family calls in the profile list, they are all ranked well above the cummulative total for fgets (and more than one contestant uses fgets)!

    In the end, it appears that Dave_Sinkula, Salem, and whoie pass all correctness tests for our working definition of a line. The other entries probably only need minor corrections to pass all tests. Salem's is the clear choice for efficiency. Nice job everyone! Congratulations Salem!

    HTH,
    Will

  9. #24
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by whoie
    Salem's is the clear choice for efficiency.
    Thanks for the profiling, I like to see such comparisons. But 'efficiency' things will get you to QoI issues as well. For example, I can produce this output from the same code.
    Code:
    /* Borland
    GetFileLineAmount    : 53463 line(s), diff = 0.16
    waltp                : 54226 line(s), diff = 0.151
    whoie                : 54226 line(s), diff = 0.32
    countlines           : 54226 line(s), diff = 0.24
    mycountlines         : 54226 line(s), diff = 0.101
    */
    
    /* GnuC
    GetFileLineAmount    : 53463 line(s), diff = 0.62
    waltp                : 54226 line(s), diff = 0.121
    whoie                : 54226 line(s), diff = 0.23
    countlines           : 54226 line(s), diff = 0.19
    mycountlines         : 54226 line(s), diff = 0.571
    */
    Just stuff to be aware of as well.

    I think there was a contest a while back in which my use of standard functions was handily beaten by OS-specific stuff.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  10. #25
    Registered User
    Join Date
    Jul 2003
    Posts
    110
    Thanks for posting your comparisons Dave, that is interesting to compare. Is diff the difference in time from call to return?

  11. #26
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Yeah, using clock.
    Code:
    #define BIND(x) {x,#x}
    
    void test(FILE *file)
    {
       static const struct sFunction
       {
          int (*call)(FILE*);
          const char *text;
       } foo[] =
       {
          BIND(GetFileLineAmount),
          BIND(waltp),
          BIND(whoie),
          BIND(countlines),
          BIND(mycountlines),
       };
       size_t i;
       for ( i = 0; i < sizeof foo / sizeof *foo; ++i )
       {
          int (*function)(FILE*) = foo[i].call;
          clock_t start = clock();
          int lines = function(file);
          double diff = (clock() - start) / (double)CLOCKS_PER_SEC;
          printf("%-20s : %d line(s), diff = %g\n", foo[i].text, lines, diff);
          rewind(file);
       }
    }
    
    int main(void)
    {
       FILE *fp = fopen("war_and_peace.txt", "r");
       if ( fp )
       {
          test(fp);
       }
       return 0;
    }
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  12. #27
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    maybe my head is stuck in 'text editor' mode - I was thinking in terms of logical lines whereas:

    empty = 0
    nonewline = 1
    one-newline = 2

    make sense to me, anyway. :-)

    nice job of profiling, btw.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Imposing Line Numbers automatically in the C code
    By cavestine in forum C Programming
    Replies: 14
    Last Post: 10-15-2007, 12:41 AM
  2. Adding Line numbers in Word
    By Mister C in forum A Brief History of Cprogramming.com
    Replies: 24
    Last Post: 06-24-2004, 08:45 PM
  3. Trouble replacing line of file
    By Rpog in forum C Programming
    Replies: 4
    Last Post: 04-19-2004, 10:22 AM
  4. Contest Results - May 27, 2002
    By ygfperson in forum A Brief History of Cprogramming.com
    Replies: 18
    Last Post: 06-18-2002, 01:27 PM
  5. follow up with char I/O and line, word, and char counting
    By Led Zeppelin in forum C Programming
    Replies: 1
    Last Post: 03-23-2002, 09:26 PM