Thread: A little optimization challenge.

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User
    Join Date
    Sep 2020
    Posts
    425

    A little optimization challenge.

    This challenge is inspired by C-UL8R's thread at How can I improve/shorten my C code?

    The challenge is to count the number of vowels and consonants in the text file that contains the works of William Shakespeare. The text file can be grabbed from https://ocw.mit.edu/ans7870/6/6.006/...hakespeare.txt

    Here's the reference implementation:

    Code:
    /***********************************************
    * charcount.c : count consonants and vowels in a file
    ***********************************************/
    #include <stdio.h>
    #include <stdint.h>
    #include <ctype.h>
    #include <sys/time.h>
    
    
    static char fname[] = "t8.shakespeare.txt";
    
    
    static int isVowel(int c) {
      c = tolower(c);
      return (c == 'a') || (c == 'e') || (c == 'i') || (c == 'o') || (c == 'u');
    }
    
    
    static int isConsonant(int c) {
      if(isVowel(c)) return 0;
      return isalpha(c);
    }
    
    
    int main(int argc, char *argv[]) {
       struct timeval start, end;
       uint32_t vowelCount = 0;
       uint32_t conCount = 0;
    
       gettimeofday(&start, NULL);
    
       FILE *f = fopen(fname,"r");
       if(f == NULL) {
          printf("Unable to open %s\n",fname);
          printf("\n");
          printf("Please download the test file from \n");
          printf("https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt\n");
          return 0;
       }
    
       //=========== Start of challenge code =========
       int c = getc(f);
       while(c != EOF) {
          if(isVowel(c)) {
             vowelCount++;
          }
          if(isConsonant(c)) {
             conCount++;
          }
          c = getc(f);
       }
       //=========== End of challenge code =========
    
       fclose(f);
    
       gettimeofday(&end, NULL);
    
       printf("There are %u consonants and %u vowels in the Complete Works of William Shakespeare\n",conCount, vowelCount);
       printf("Time taken = %li microseconds\n", (long int)(end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec));
    }
    Of course you can add as many functions or other code as you like. Code size or memory usage isn't a concern, it just has to be portable C code.

    Scoring will be a honesty system, and just be a "this is X times better than the reference code on my machine". It is expected that you will compile both code with the same set of compiler options, with optimizations turned on.

    e.g. On my i3-3110M laptop the above code takes 98733 microseconds to run. If I write an improved version that takes 12345 microseconds to run it is 98733/12345 = 7.99x the speed of the original.

    For completeness sake, here is the answer I get - I hope it's right!

    Code:
    There are 2357034 consonants and 1433891 vowels in the Complete Works of William Shakespeare
    Time taken = 98733 microseconds
    I'm guessing a 5x is easily possible, a 10x speedup should be achievable, but I'll be very surprised if anybody gets a 20x speedup without resorting to using multiple threads or CPU specific features.

    I'll post my best version in about a week...

  2. #2
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    With compiler optimisations turned on or off?

  3. #3
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Quote Originally Posted by Hodor View Post
    With compiler optimisations turned on or off?
    As per the OP:

    It is expected that you will compile both code with the same set of compiler options, with optimizations turned on.

  4. #4
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Ok, as a start I'll begin with this (no compiler optimisations)

    Code:
       //=========== Start of challenge code =========
       
       int c;
       while((c = getc(f)) != EOF) {
            if (isalpha(c)) {
               int lc = tolower(c);
               int iv = lc == 'e' || lc == 'a' || lc == 'o' || lc == 'i' || lc == 'u';
               vowelCount += iv;
               conCount += !iv;
            }
       }
       //=========== End of challenge code =========
    The original code:

    Code:
    $./a.out 
    There are 2357034 consonants and 1433891 vowels in the Complete Works of William Shakespeare
    Time taken = 102125 microseconds
    My modified code:

    Code:
    $ ./a.out 
    There are 2357034 consonants and 1433891 vowels in the Complete Works of William Shakespeare
    Time taken = 32977 microseconds
    Only optimisation I made, really, is ordering the checks for a vowel based on English vowel frequency; i.e. I'm not concerned about much, right now, apart from how often vowels appear in typical English text. So, no real code changes but an improvement.

    Edit: So I guess I didn't optimise it at all. Or did I?

    Edit2: meh... compiler optimisations were turned off. My bad. My non-code-related changes are still faster with -O2 though.
    Last edited by Hodor; 10-02-2020 at 04:38 AM.

  5. #5
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    My best shot, so far.

    The 'original' is the code above. 'count' is mine. Pure C, portable (I think)...

    I'll post when you post yours!

    Code:
    $ ./original; echo; ./count
    Cycles: 223980894.
    There are 2357034 consonants and 1433891 vowels in the Complete Works of William Shakespeare
    Time taken = 66048 μs
    
    Cycles: 22487090.
    There are 2357034 consonants and 1433891 vowels in the Complete Works of William Shakespeare.
    Time: 6671 μs.
    Last edited by flp1969; 10-02-2020 at 06:45 AM.

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by flp1969
    I'll post when you post yours!
    Clearly, you need a code escrow service
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Ok... here it is my version:
    Code:
    #include <stdio.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <stdlib.h>
    #include <sys/time.h>
    
    #ifdef DEBUG
    # include <inttypes.h>
    # include "cycle_counting.h"
    #endif
    
    #define FILENAME      "t8.shakespeare.txt"
    
    // Ideal buffer size?!
    // I got the best results for disk I/O with this size.
    #define BUFFER_SIZE   16384
    
    // is_vowel() and is_consonant() assumes ASCII.
    // Technically this is not "portable", since we could've been dealing
    // with EBCDIC, for example!
    // But I think is "portable" enough.
    //
    // Also, don't challenge the default word size because using
    // _Bool will slow things down.
    static int is_vowel( unsigned char c )
    {
      // 1 KiB array to avoid "special" characters used in
      // user's system (works with UTF-8 too).
      static const int vowels[] =
      {
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
        0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
        0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
      };
    
      return vowels[c];
    }
    
    static int is_consonant( unsigned char c )
    {
      // 1 KiB array to avoid "special" characters used in
      // user's system (works with UTF-8 too).
      static const int consonants[] =
      {
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
        1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
        1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
      };
    
      return consonants[c];
    }
    
    int main( void )
    {
      static char buffer[BUFFER_SIZE];
      char *p;
      int fd;
      ssize_t size;
      unsigned long int v, c;   // 'long int' is 64 bits long on x86-64.
                                // I could use 'unsigned int' to speed things up, but
                                // I don't know if the counts would overflow.
      struct timeval start, end;
    
      gettimeofday( &start, NULL );
    
    #ifdef DEBUG
      counter_T c1;
    
      c1 = BEGIN_TSC();
    #endif
    
      // NOTE: Using file descriptors and syscalls
      //       because FILE streams are too slow.
      if ( ( fd = open( FILENAME, O_RDONLY ) ) < 0 )
      {
        fputs( "ERROR opening file.\n", stderr );
        return EXIT_FAILURE;
      }
    
      v = c = 0;
      while ( ( size = read( fd, buffer, BUFFER_SIZE ) ) > 0 )
      {
        p = buffer;
    
        while ( size-- )
        {
          // NOTE: 'else' is necessary not to do the test twice!
          //       And since we have more consonants than vowels (I think),
          //       the order of these tests are important.
          if ( is_consonant( *p ) )
            c++;
          else
            if ( is_vowel( *p ) )
              v++;
    
          p++;
        }
      }
    
      close( fd );
    
    #ifdef DEBUG
      END_TSC( &c1 );
      printf( "Cycles: %" PRIu64 ".\n", c1 );
    #endif
    
      gettimeofday( &end, NULL );
    
      printf( "There are %lu consonants and %lu vowels in the Complete Works of William Shakespeare.\n", c, v );
      printf( "Time: %ld μs.\n", ( long int )( end.tv_sec - start.tv_sec ) * 1000000 + ( end.tv_usec - start.tv_usec ) );
    
      return EXIT_SUCCESS;
    }
    cycle_counting.h is this one:
    Code:
    #ifndef CYCLE_COUNTING_INCLUDED__
    #define CYCLE_COUNTING_INCLUDED__
    
    #ifndef __GNUC__
    # error Works only on GCC
    #endif
    
    /* ==========================================
        Quick & Dirty cycle counting...
    
        As funções usadas para contar a quantidade de ciclos
        de clock gastos num bloco de código.
    
        Exemplo de uso:
    
          counter_T cnt;
    
          cnt = BEGIN_TSC();      
          f();
          END_TSC(&cnt);
    
        Defina SYNC_MEM se quiser uma serialização mais completa (mfence).
       ========================================== */
    #include <stdint.h>
    
    // É necessário usar esse "tipo" para os contadores
    // porque o compilador vai tentar otimizar qualquer variável
    // não volátil.
    typedef volatile uint64_t counter_T;
    
    #if defined(__x86_64__)
      inline counter_T BEGIN_TSC( void )
      {
        uint32_t a, d;
    
        __asm__ __volatile__ (
      #ifdef SYNC_MEM
          "mfence\n\t"
      #endif
          "cpuid\n\t"
          "rdtsc" : "=a" (a), "=d" (d) : "a" (0) : "rbx", "rcx"
        );
    
        return a | ((uint64_t)d << 32);
      }
    
      inline void END_TSC( counter_T *cptr )
      {
        uint32_t a, d;
    
        __asm__ __volatile__ (
          "rdtscp" : "=a" (a), "=d" (d) :: "rcx"
        );
    
        *cptr = (a | ((uint64_t)d << 32)) - *cptr;;
      }
    #elif defined(__i386__)
      inline counter_T BEGIN_TSC( void )
      {
        uint32_t a, d;
    
        __asm__ __volatile__ (
      // Só usa mfence se o processador suportar SSE2.
      #ifdef __SSE2__
      #  ifdef SYNC_MEM
          "mfence\n\t"
      #  endif
      #endif
          "cpuid\n\t"
          "rdtsc" : "=a" (a), "=d" (d) : "a" (0) : "ebx", "ecx"
        );
    
        return a | ((uint64_t)d << 32);
      }
    
      inline void END_TSC( counter_T *cptr )
      {
        uint32_t a, d;
    
        __asm__ __volatile__ (
          "rdtscp" : "=a" (a), "=d" (d) :: "ecx"
        );
    
        *cptr = (a | ((uint64_t)d << 32)) - *cptr;
      }
    #elif __arm__
      #if __ARM_32BIT_STATE == 1
        // This works only on ARMv8
        #if __ARM_ARCH__ > 7
          inline counter_T BEGIN_TSC( void )
          {
            unsigned int r0, r1;
    
            __asm__ __volatile__ (
          #ifdef SYNC_MEM
              "dmb\n\t"
          #endif
              "mrrc p15,1,%0,%1,c14"        // FIXME: Must check this.
                                            //        Get the virtual counter.
              : "=r" (r0), "=r" (r1)
            );
    
            return ((uint64_t)r1 << 32) | r0;
          }
    
          inline void END_TSC( counter_T *cptr )
          {
            unsigned int r0, r1;
    
            __asm__ __volatile__ (
              "mrrc p15,1,%0,%1,c14"      // FIXME: Must check this.
                                          //        get the virtual counter.
              : "=r" (r0), "=r" (r1)
            );
    
            *cptr = (((uint64_t)r1 << 32) | r0) - *cptr;
          }
        #else
          #error ARMv8 or superior only.
        #endif
      #else   // otherwise we are in aarch64 mode.
        inline counter_T BEGIN_TSC( void )
        {
          uint64_t count;
    
          __asm__ __volatile__ ( 
        #ifdef SYNC_MEM
          "dmb\n\t"
        #endif
          "mrs %0,cntvct_el0" : "=r" (count) );
    
          return count;
        }
    
        inline void END_TSC( counter_T *cptr )
        {
          uint64_t count;
    
          __asm__ __volatile__ ( "mrs %0,cntvct_el0" : "=r" (count) );
    
          *cptr = count - *cptr;
        }
      #endif
    #else
    # error i386, x86-64 and ARM only.
    #endif
    
    #endif
    And here's the Makefile:
    Code:
    # You said otimizations are allowed!!!
    CFLAGS=-O2 -march=native -mtune=native -ftree-vectorize -fno-stack-protector
    LDFLAGS=-s
    
    .PHONY: all clean distclean
    
    all: original count
    
    original: original.o
        $(CC) $(LDFLAGS) -o $@ $^
    
    count: count.o
        $(CC) $(LDFLAGS) -o $@ $^
    
    count.o: count.c cycle_counting.h
    original.o: original.c cycle_counting.h
    
    clean:
        -rm *.o
    
    distclean: clean
        rm original count
    Of course, I added the cycle counting on your original code as well.. Just don't define DEBUG and the code above is pure C.
    Last edited by flp1969; 10-02-2020 at 07:05 AM.

  8. #8
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Quote Originally Posted by flp1969 View Post
    Ok... here it is my version:
    Code:
    #include <stdio.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <sys/time.h>
    
    #ifdef DEBUG
    # include <inttypes.h>
    # include "cycle_counting.h"
    #endif
    
    #define FILENAME      "t8.shakespeare.txt"
    
    // Ideal buffer size?!
    // I got the best results for disk I/O with this size.
    #define BUFFER_SIZE   16384
    
    // is_vowel() and is_consonant() assumes ASCII.
    // Technically this is not "portable", since we could've been dealing
    // with EBCDIC, for example!
    // But I think is "portable" enough.
    //
    // Also, don't challenge the default word size because using
    // _Bool will slow things down.
    static int is_vowel( unsigned char c )
    {
      // 1 KiB array to avoid "special" characters used in
      // user's system (works with UTF-8 too).
      static const int vowels[] =
      {
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
        0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
        0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
      };
    
      return vowels[c];
    }
    
    static int is_consonant( unsigned char c )
    {
      // 1 KiB array to avoid "special" characters used in
      // user's system (works with UTF-8 too).
      static const int consonants[] =
      {
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
        1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
        1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
      };
    
      return consonants[c];
    }
    
    int main( void )
    {
      static char buffer[BUFFER_SIZE];
      char *p;
      int fd;
      ssize_t size;
      unsigned long int v, c;   // 'long int' is 64 bits long on x86-64.
                                // I could use 'unsigned int' to speed things up, but
                                // I don't know if the counts would overflow.
      struct timeval start, end;
    
      gettimeofday( &start, NULL );
    
    #ifdef DEBUG
      counter_T c1;
    
      c1 = BEGIN_TSC();
    #endif
    
      // NOTE: Using file descriptors and syscalls
      //       because FILE streams are too slow.
      fd = open( FILENAME, O_RDONLY );
    
      v = c = 0;
      while ( ( size = read( fd, buffer, BUFFER_SIZE ) ) > 0 )
      {
        p = buffer;
    
        while ( size-- )
        {
          // NOTE: 'else' is necessary not to do the test twice!
          //       And since we have more consonants than vowels (I think),
          //       the order of these tests are important.
          if ( is_consonant( *p ) )
            c++;
          else
            if ( is_vowel( *p ) )
              v++;
    
          p++;
        }
      }
    
      close( fd );
    
    #ifdef DEBUG
      END_TSC( &c1 );
      printf( "Cycles: %" PRIu64 ".\n", c1 );
    #endif
    
      gettimeofday( &end, NULL );
    
      printf( "There are %lu consonants and %lu vowels in the Complete Works of William Shakespeare.\n", c, v );
      printf( "Time: %ld μs.\n", ( long int )( end.tv_sec - start.tv_sec ) * 1000000 + ( end.tv_usec - start.tv_usec ) );
    
      return 0;
    }
    cycle_counting.h is this one:
    Code:
    #ifndef CYCLE_COUNTING_INCLUDED__
    #define CYCLE_COUNTING_INCLUDED__
    
    #ifndef __GNUC__
    # error Works only on GCC
    #endif
    
    /* ==========================================
        Quick & Dirty cycle counting...
    
        As funções usadas para contar a quantidade de ciclos
        de clock gastos num bloco de código.
    
        Exemplo de uso:
    
          counter_T cnt;
    
          cnt = BEGIN_TSC();      
          f();
          END_TSC(&cnt);
    
        Defina SYNC_MEM se quiser uma serialização mais completa (mfence).
       ========================================== */
    #include <stdint.h>
    
    // É necessário usar esse "tipo" para os contadores
    // porque o compilador vai tentar otimizar qualquer variável
    // não volátil.
    typedef volatile uint64_t counter_T;
    
    #if defined(__x86_64__)
      inline counter_T BEGIN_TSC( void )
      {
        uint32_t a, d;
    
        __asm__ __volatile__ (
      #ifdef SYNC_MEM
          "mfence\n\t"
      #endif
          "cpuid\n\t"
          "rdtsc" : "=a" (a), "=d" (d) : "a" (0) : "rbx", "rcx"
        );
    
        return a | ((uint64_t)d << 32);
      }
    
      inline void END_TSC( counter_T *cptr )
      {
        uint32_t a, d;
    
        __asm__ __volatile__ (
          "rdtscp" : "=a" (a), "=d" (d) :: "rcx"
        );
    
        *cptr = (a | ((uint64_t)d << 32)) - *cptr;;
      }
    #elif defined(__i386__)
      inline counter_T BEGIN_TSC( void )
      {
        uint32_t a, d;
    
        __asm__ __volatile__ (
      // Só usa mfence se o processador suportar SSE2.
      #ifdef __SSE2__
      #  ifdef SYNC_MEM
          "mfence\n\t"
      #  endif
      #endif
          "cpuid\n\t"
          "rdtsc" : "=a" (a), "=d" (d) : "a" (0) : "ebx", "ecx"
        );
    
        return a | ((uint64_t)d << 32);
      }
    
      inline void END_TSC( counter_T *cptr )
      {
        uint32_t a, d;
    
        __asm__ __volatile__ (
          "rdtscp" : "=a" (a), "=d" (d) :: "ecx"
        );
    
        *cptr = (a | ((uint64_t)d << 32)) - *cptr;
      }
    #elif __arm__
      #if __ARM_32BIT_STATE == 1
        // This works only on ARMv8
        #if __ARM_ARCH__ > 7
          inline counter_T BEGIN_TSC( void )
          {
            unsigned int r0, r1;
    
            __asm__ __volatile__ (
          #ifdef SYNC_MEM
              "dmb\n\t"
          #endif
              "mrrc p15,1,%0,%1,c14"        // FIXME: Must check this.
                                            //        Get the virtual counter.
              : "=r" (r0), "=r" (r1)
            );
    
            return ((uint64_t)r1 << 32) | r0;
          }
    
          inline void END_TSC( counter_T *cptr )
          {
            unsigned int r0, r1;
    
            __asm__ __volatile__ (
              "mrrc p15,1,%0,%1,c14"      // FIXME: Must check this.
                                          //        get the virtual counter.
              : "=r" (r0), "=r" (r1)
            );
    
            *cptr = (((uint64_t)r1 << 32) | r0) - *cptr;
          }
        #else
          #error ARMv8 or superior only.
        #endif
      #else   // otherwise we are in aarch64 mode.
        inline counter_T BEGIN_TSC( void )
        {
          uint64_t count;
    
          __asm__ __volatile__ ( 
        #ifdef SYNC_MEM
          "dmb\n\t"
        #endif
          "mrs %0,cntvct_el0" : "=r" (count) );
    
          return count;
        }
    
        inline void END_TSC( counter_T *cptr )
        {
          uint64_t count;
    
          __asm__ __volatile__ ( "mrs %0,cntvct_el0" : "=r" (count) );
    
          *cptr = count - *cptr;
        }
      #endif
    #else
    # error i386, x86-64 and ARM only.
    #endif
    
    #endif
    And here's the Makefile:
    Code:
    # You said otimizations are allowed!!!
    CFLAGS=-O2 -march=native -mtune=native -ftree-vectorize -fno-stack-protector
    LDFLAGS=-s
    
    .PHONY: all clean distclean
    
    all: original count
    
    original: original.o
        $(CC) $(LDFLAGS) -o $@ $^
    
    count: count.o
        $(CC) $(LDFLAGS) -o $@ $^
    
    count.o: count.c
    original.o: original.c
    
    clean:
        -rm *.o
    
    distclean: clean
        rm original count
    Of course, I added the cycle counting on your original code as well.. Just don't define DEBUG and the code above is pure C.

    That's stupid, flp1969. You have branches (e.g. if ( is_vowel( *p ) )) in that code so it's just... stupid.

  9. #9
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by Hodor View Post
    That's stupid, flp1969. You have branches (e.g. if ( is_vowel( *p ) )) in that code so it's just... stupid.
    Well... do better then...

  10. #10
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Quote Originally Posted by flp1969 View Post
    Well... do better then...

    I don't think I can

  11. #11
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    The point I was trying to make was that optimisations do not necessarily come from "code"; they can come from other places (i.e. the order you compare aeiou can make a huge difference because of "short circuiting"). The order things are checked is, in my opinion, can be more important than the code itself. I.e. I've trivialised it but the algorithm (and in this instance the assumptions it makes) is often more improtant than the code.


    That said, your code is pretty impressive.

  12. #12
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    PS: Improved the code a little more... now it is 10x faster.

  13. #13
    'Allo, 'Allo, Allo
    Join Date
    Apr 2008
    Posts
    639
    Here's my entry:

    Code:
    puts("There are 2357034 consonants and 1433891 vowels in the Complete Works of William Shakespeare");
    My optimisation was working smarter, not harder and leveraging the work of others.
    My solution is also fortified with the domain-specific knowledge that if Shakespeare indeed does rise as a zombie, he will be more occupied with trying to eat the brains of those attributing his works to Francis Bacon than with the thought of renaming Titania to Cheryl and so completely discombobulating the letter count of this hitherto static file.

  14. #14
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    About 15x faster than the reference code...

    Code:
    /***********************************************
    * charcount.c : count chars and vowels in a file
    ***********************************************/
    #include <stdio.h>
    #include <stdint.h>
    #include <ctype.h>
    #include <sys/time.h>
    
    
    char fname[] = "t8.shakespeare.txt";
    
    
    int isVowel(int c) {
      c = tolower(c);
      return (c == 'a') || (c == 'e') || (c == 'i') || (c == 'o') || (c == 'u');
    }
    
    
    int isConsonant(int c) {
      if(isVowel(c)) return 0;
      return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
    }
    
    
    
    
    static unsigned char buffer[16*1024];
    static uint64_t lookupTable1[256];
    static uint64_t lookupTable2[256*256];
    
    
    static void tableSetup(void) {
       // Table for checking one character at a time
       // isConstonant in the low bit
       // isVowel in bit 32 
       for(int i = 0; i < 256; i++) {
          lookupTable1[i]  = isVowel(i) ? (1LL<<32) : 0;
          lookupTable1[i] += isConsonant(i) ? 1 : 0;
       }
    
    
       // Table for checking two characters at a time
       // isConstonant in the low bit
       // isVowel in bit 32 
       for(int i = 0; i < 256*256; i++) {
          lookupTable2[i]  = isVowel(i&0xFF) ? (1LL<<32) : 0;
          lookupTable2[i] += isConsonant(i&0xFF) ? 1 : 0;
          lookupTable2[i] += isVowel(i>>8) ? (1LL<<32) : 0;
          lookupTable2[i] += isConsonant(i>>8) ? 1 : 0;
       }
    }
    
    
    int main(int argc, char *argv[]) {
       struct timeval start, end;
       uint32_t conCount = 0, vowelCount = 0; 
       uint64_t total = 0;
       gettimeofday(&start, NULL);
    
    
    
    
       FILE *f = fopen(fname,"r");
       if(f == NULL) {
          printf("Unable to open %s\n",fname);
          printf("\n");
          printf("Please download the test file from \n");
          printf("https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt\n");
          return 0;
       }
    
    
       //============== Start of code ==========
       tableSetup();
    
    
       int chars_read = fread(buffer,1,sizeof(buffer),f);
       while(chars_read > 0) {
          unsigned char *b = buffer;
          while( chars_read >= 8) {
             uint16_t *b2 = (uint16_t *)b;
             total  += lookupTable2[b2[0]]; 
             total  += lookupTable2[b2[1]]; 
             total  += lookupTable2[b2[2]]; 
             total  += lookupTable2[b2[3]]; 
             total  += lookupTable2[b2[4]]; 
             total  += lookupTable2[b2[5]]; 
             total  += lookupTable2[b2[6]]; 
             total  += lookupTable2[b2[7]]; 
             b          += 16;
             chars_read -= 16;
          }
          while( chars_read > 0) {
             total  += lookupTable1[b[0]]; 
             b++;
             chars_read--;
          }
          chars_read = fread(buffer,1,sizeof(buffer),f);
       }
       conCount   = total;
       vowelCount = total>>32;
       // ========== End of code ==========
       fclose(f);
    
    
       gettimeofday(&end, NULL);
    
    
       printf("There are %i consonants and %i vowels in the complete works of William Shakespeare\n",conCount, vowelCount);
       printf("Time taken = %li microseconds\n", (long int)(end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec));
       return 0;
    }

  15. #15
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Found a little shortcut in my table setup code, so it's now 16x faster. Updated function below.

    Code:
    static void tableSetup(void) {
       // Table for checking one character at a time
       // isConsonant in the low bit
       // isVowel in bit 32 
       for(int i = 0; i < 256; i++) {
          lookupTable1[i]  = isVowel(i) ? (1LL<<32) : 0;
          lookupTable1[i] += isConsonant(i) ? 1 : 0;
       }
    
    
       // Table for checking two characters at a time
       // isConstonant in the low bit
       // isVowel in bit 32 
       for(int i = 0; i < 256*256; i++) {
          lookupTable2[i] = lookupTable1[i&0xFF] + lookupTable1[i>>8];
       }
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Need optimization
    By sc7 in forum C++ Programming
    Replies: 22
    Last Post: 11-10-2018, 11:11 AM
  2. Optimization challenge
    By Zacariaz in forum C Programming
    Replies: 1
    Last Post: 06-02-2015, 11:00 AM
  3. Optimization
    By lach in forum C Programming
    Replies: 4
    Last Post: 03-18-2006, 12:08 PM
  4. optimization
    By krithi in forum C Programming
    Replies: 9
    Last Post: 01-19-2003, 10:52 AM
  5. IDE optimization
    By Traveller in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 07-04-2002, 02:01 AM

Tags for this Thread