Thread: I can't spot my mistake...

  1. #106
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Anyone know why this would be reported?
    Code:
    realloc(): invalid next size

  2. #107
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    I can't get the current version pulled from GitHub to build.

    I can't bench-check the functioning of your code as it feels too weird, preventing me from seeing what is going on.

    That code feels weird to me for the following reason:

    - The for loop assigns reg as the initial setup, even though it isn't used in the test expression.

    - reg is assigned again in the loop body, to the same value, making the initial assignment not required.

    - The value of count is modified in the for loop to cause it to exit, which is an unusual pattern. It also breaks my understanding of count, as it is no longer the count of registers that have been used.

    - The assignments following the loop have a a function of count assigned to index, and then a function of index assigned to count. This is playing mental "follow the ball under the shell" game as I try to follow which variable means what at which point.

    - The 'reg' variable isn't really needed and confuses things, seems to be a hangover from earlier versions. Maybe.

    - Assigning a value into the result of the alu_used() function isn't C-like.

    If I was to rewrite it, to match my understanding of what is going on I would do it like this:

    Code:
    uint_t alu_get_reg_node( alu_t *alu, size_t Nsize )
    {
        if (!alu ) {
            (void)alu_err_null_ptr( "alu" );
            return 0;
        }
      
        // Work out size of register
        if(alu_Nsize(alu)) > Nsize)
           Nsize = alu_Nsize( alu );
    
    
        // Hunt for the first unused reg 
        uint_t index = ALU_REG_NEEDED;
        while(index < alu_used(alu) && alu_get_active( alu, index ))    
           index++;
    
    
        // Try to set up that register - will fail if out of registers
        alu_errno(alu) = alu_setup_reg( alu, index, Nsize );
        if ( alu_errno(alu) ) {
          alu_error( alu_errno(alu) );
          return 0;
        }
    
    
        // Did we need to use a new register?
        if(index == alu_used(alu))  
            alu_used( alu ) = alu_used(alu)+1;
        } 
    
    
        // Empty out the newly allocated register and mark it active
        memset( alu_data( alu, index ), 0, Nsize );
        alu_set_active( alu, index );
        return index;
    }
    I know it doesn't fit the style of the project, but I feel that the intent of the code is much clearer.
    Last edited by hamster_nz; 10-01-2020 at 03:58 AM.

  3. #108
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Quote Originally Posted by awsdert View Post
    Anyone know why this would be reported?
    Code:
    realloc(): invalid next size
    Short answer: Usually it means that you you have corrupted your heap by accessing more memory then you have allocated.

    Long answer: Malloc allocates memory in chunks that are rounded up in size to be aligned to a particular address boundary.

    When you call realloc() looks at the next block of memory to see if it can use it to satisfy the request. Malloc has discovered that the size field for that block has been corrupted.

  4. #109
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by hamster_nz View Post
    Short answer: Usually it means that you you have corrupted your heap by accessing more memory then you have allocated.

    Long answer: Malloc allocates memory in chunks that are rounded up in size to be aligned to a particular address boundary.

    When you call realloc() looks at the next block of memory to see if it can use it to satisfy the request. Malloc has discovered that the size field for that block has been corrupted.
    I'll answer this one 1st, just a few minutes ago I found that it was because I was using the same size for reallocation, which apparently means realloc is not designed to just give back the old pointer and pretend it was never called in that situation (despite that being the obvious thing to do)

  5. #110
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Quote Originally Posted by awsdert View Post
    I'll answer this one 1st, just a few minutes ago I found that it was because I was using the same size for reallocation, which apparently means realloc is not designed to just give back the old pointer and pretend it was never called in that situation (despite that being the obvious thing to do)
    I would put money on you corrupting the heap. The corurption will only be visible when something calls realloc(), free(), realloc() or related functions:

    Here's a small test program that will recreate a "realloc(): invalid next size" error. Note how it allocates 100 bytes, then clears 120.


    Code:
    #include <malloc.h>
    #include <stdio.h>
    #include <memory.h>
    
    
    int main(int argc, char *argv[]) {
       char *ptr1;
    
    
       ptr1 = malloc(100);
       if(ptr1 == NULL) {
          printf("failled to malloc\n");
          return 0;
       }
    
    
    
       // Corrupt the heap
       memset(ptr1,0,120);
    
    
       // Now try the realloc
       ptr1 = realloc(ptr1,120);
    
    
       return 0;
    }
    And the output when it is run:

    Code:
    ~/realloc_test$ ./realloc_test 
    realloc(): invalid next size
    Aborted (core dumped)
    These memory corruption issues are a complete pain to debug.

    On Linux, if you add a call to mcheck() (defined in mcheck.h) before you use any of the malloc functions you will get better debug infor:

    Code:
    hamster@hamster-envy:~/realloc_test$ cat realloc_test.c 
    #include <malloc.h>
    #include <stdio.h>
    #include <mcheck.h>
    #include <memory.h>
    
    
    int main(int argc, char *argv[]) {
       char *ptr1;
       mcheck(NULL);
       ptr1 = malloc(100);
       if(ptr1 == NULL) {
          printf("failled to malloc\n");
          return 0;
       }
    
    
       // Corrupt the heap
       memset(ptr1,0,120);
    
    
       // Now try the realloc
       ptr1 = realloc(ptr1,120);
    
    
       return 0;
    }
    hamster@hamster-envy:~/realloc_test$ ./realloc_test 
    memory clobbered past end of allocated block
    Aborted (core dumped)

  6. #111
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by hamster_nz View Post
    I can't get the current version pulled from GitHub to build.

    I can't bench-check the functioning of your code as it feels too weird, preventing me from seeing what is going on.

    That code feels weird to me for the following reason:

    - The for loop assigns reg as the initial setup, even though it isn't used in the test expression.

    - reg is assigned again in the loop body, to the same value, making the initial assignment not required.

    - The value of count is modified in the for loop to cause it to exit, which is an unusual pattern. It also breaks my understanding of count, as it is no longer the count of registers that have been used.

    - The assignments following the loop have a a function of count assigned to index, and then a function of index assigned to count. This is playing mental "follow the ball under the shell" game as I try to follow which variable means what at which point.

    - The 'reg' variable isn't really needed and confuses things, seems to be a hangover from earlier versions. Maybe.

    - Assigning a value into the result of the alu_used() function isn't C-like.

    If I was to rewrite it, to match my understanding of what is going on I would do it like this:

    Code:
    uint_t alu_get_reg_node( alu_t *alu, size_t Nsize )
    {
        if (!alu ) {
            (void)alu_err_null_ptr( "alu" );
            return 0;
        }
      
        // Work out size of register
        if(alu_Nsize(alu)) > Nsize)
           Nsize = alu_Nsize( alu );
    
    
        // Hunt for the first unused reg 
        uint_t index = ALU_REG_NEEDED;
        while(index < alu_used(alu) && alu_get_active( alu, index ))    
           index++;
    
    
        // Try to set up that register - will fail if out of registers
        alu_errno(alu) = alu_setup_reg( alu, index, Nsize );
        if ( alu_errno(alu) ) {
          alu_error( alu_errno(alu) );
          return 0;
        }
    
    
        // Did we need to use a new register?
        if(index == alu_used(alu))  
            alu_used( alu ) = alu_used(alu)+1;
        } 
    
    
        // Empty out the newly allocated register and mark it active
        memset( alu_data( alu, index ), 0, Nsize );
        alu_set_active( alu, index );
        return index;
    }
    I know it doesn't fit the style of the project, but I feel that the intent of the code is much clearer.
    For reader's reference hamster_nz is talking about this code:
    Code:
    uint_t alu_get_reg_node( alu_t *alu, size_t Nsize )
    {
    	uint_t count = 0, index, reg;
    	
    	if ( alu )
    	{
    		index = ALU_REG_ID_NEED;
    		count = alu_used( alu );
    		
    		for ( reg = index; index < count; ++index )
    		{
    			reg = index;
    			count = SET1IF( alu_get_active( alu, index ), count );
    		}
    		
    		index = HIGHEST( count, reg );
    		count = HIGHEST( index + 1, alu_used(alu) );
    		Nsize = HIGHEST( Nsize, alu_Nsize( alu ) );
    		reg = HIGHEST( count, alu_upto( alu ) );
    		
    		alu_errno(alu) = alu_setup_reg( alu, reg, Nsize );
    				
    		if ( alu_errno(alu) )
    		{
    			alu_error( alu_errno(alu) );
    			return 0;
    		}
    		
    		alu->taken = count;
    		
    		(void)memset( alu_data( alu, index ), 0, Nsize );
    		
    		alu_set_active( alu, index );
    		
    		return index;
    	}
    	
    	(void)alu_err_null_ptr( "alu" );
    	return 0;
    }
    - count is cleared for the sake of avoiding an additional branch (which slows code normally - read up on how CPUs treat branch instructions to understand what I mean), as you'll note shortly after the loop count is overwritten with the next expected used registers count, in the case an available register was found it will be the old count, in the case that no register was found then it will be the index (which itself is either the original count or an available register) + 1
    - reg is only used to hold the value of index before it is incremented, it is given back in the case count was cleared, after that the variable has no particular purpose so I reused it for holding the total registers to allocate (which is usually the existing amount, only higher if count is higher)
    - Using !alu for branching is bad because at every branch the CPU preloads instructions with the assumption that the result will be true, when false it has to throw away that preload and load the correct instructions, this is why I now place code for false results after the if statement (whether it is in an else statement or otherwise)
    - assigning a value to alu_used(alu) is fine when the "function" is nothing more than a macro to the value that can be overwritten, this can also happen with errno last I checked so it is considered acceptable in the right situations
    - Your rewrite has too many branches and would inevitably slow down the ALU so I won't be using it

  7. #112
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Here's that same code, where it doesn't smash the heap.

    Code:
    #include <malloc.h>
    #include <stdio.h>
    #include <mcheck.h>
    #include <memory.h>
    
    
    int main(int argc, char *argv[]) {
       char *ptr1;
       mcheck(NULL);
       ptr1 = malloc(120);
       if(ptr1 == NULL) {
          printf("failled to malloc\n");
          return 0;
       }
    
    
       // Don't corrupt the heap
       memset(ptr1,0,120);
    
    
       // Now try the realloc
       ptr1 = realloc(ptr1,120);
       printf("Success\n");
       free(ptr1);
       return 0;
    }
    It prints "success" as expected, even after realloc()ing to the same size.

  8. #113
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Going back to my problem with realloc(), while adjusting the requested size handling did allow it to get further (and even create a speedup for alu_block()) it still has some issue crop up, here's the output before the test came to a screeching halt.
    Code:
    make rebuild check
    cd ../ && make --no-print-directory rebuild check
    #MAKECMDGOALS=rebuild check
    cd mak && make -j 1 --no-print-directory -f main.mak rebuild check
    PRJ_LIB_NAME=alu_d
    Checking 3rd Party libraries are upto date
    cd '../cloned/unic' && git fetch && git pull
    Finished checking
    PRJ_DST_BIN=check_alu_d.AppImage
    PRJ_DST_LIB=libalu_d.so
    rm -f ../bin/*.AppImage ../bin/*.exe
    rm -f ../lib/*.so ../lib/*.dll
    rm -f ../src/*.o ../src/*.obj
    rm -f ../tests/*.o ../tests/*.obj
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../src/alu_bit_d.o -c ../src/alu_bit.c
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../src/alu_fpn_d.o -c ../src/alu_fpn.c
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../src/alu_int_d.o -c ../src/alu_int.c
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../src/alu_main_d.o -c ../src/alu_main.c
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../src/alu_math_d.o -c ../src/alu_math.c
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../src/alu_mem_d.o -c ../src/alu_mem.c
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../src/alu_uint_d.o -c ../src/alu_uint.c
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../src/alu_vec_d.o -c ../src/alu_vec.c
    cc -ggdb -D _DEBUG -O0  -fPIC -shared  -o ../lib/libalu_d.so ../src/alu_bit_d.o ../src/alu_fpn_d.o ../src/alu_int_d.o ../src/alu_main_d.o ../src/alu_math_d.o ../src/alu_mem_d.o ../src/alu_uint_d.o ../src/alu_vec_d.o -Wl,-rpath,../lib
    cc -ggdb -D _DEBUG -O0  -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK  -o ../tests/check_alu_d.o -c ../tests/check_alu.c
    In file included from ../tests/check_alu.c:181:
    ../tests/check_alu.c:491:13: warning: ‘test_alu_reg2str’ defined but not used [-Wunused-variable]
      491 | START_TEST( test_alu_reg2str )
          |             ^~~~~~~~~~~~~~~~
    ../tests/check_alu.c:356:13: warning: ‘test_alu_op3’ defined but not used [-Wunused-variable]
      356 | START_TEST( test_alu_op3 )
          |             ^~~~~~~~~~~~
    cc -ggdb -D _DEBUG -O0  -fPIE -L ../lib  -o ../bin/check_alu_d.AppImage ../tests/check_alu_d.o -Wl,-rpath,../lib -l alu_d -l check
    ../bin/check_alu_d.AppImage
    ../tests/check_alu.c:864: main() 'Running unit tests under 'check' test suite'
    Running suite(s): ALU
    ../src/alu_mem.c:68: alu_block() want = 512, size = 544, given = 0
    ../src/alu_mem.c:30: alu_block() want = 1040, size = 1072, given = 512
    ../src/alu_mem.c:30: alu_block() want = 1056, size = 1088, given = 1040
    ../src/alu_mem.c:30: alu_block() want = 1072, size = 1104, given = 1056
    ../src/alu_mem.c:30: alu_block() want = 1088, size = 1120, given = 1072
    ../src/alu_mem.c:30: alu_block() want = 1104, size = 1136, given = 1088
    ../src/alu_mem.c:30: alu_block() want = 1120, size = 1152, given = 1104
    ../src/alu_mem.c:30: alu_block() want = 1136, size = 1168, given = 1120
    ../src/alu_mem.c:30: alu_block() want = 1152, size = 1184, given = 1136
    ../src/alu_mem.c:30: alu_block() want = 1168, size = 1200, given = 1152
    ../src/alu_mem.c:30: alu_block() want = 1184, size = 1216, given = 1168
    ../src/alu_mem.c:30: alu_block() want = 1200, size = 1232, given = 1184
    ../src/alu_mem.c:30: alu_block() want = 1216, size = 1248, given = 1200
    ../src/alu_mem.c:30: alu_block() want = 1232, size = 1264, given = 1216
    ../src/alu_mem.c:30: alu_block() want = 1248, size = 1280, given = 1232
    ../src/alu_mem.c:30: alu_block() want = 1264, size = 1296, given = 1248
    ../src/alu_mem.c:30: alu_block() want = 1280, size = 1312, given = 1264
    ../src/alu_mem.c:30: alu_block() want = 1296, size = 1328, given = 1280
    ../src/alu_mem.c:30: alu_block() want = 1312, size = 1344, given = 1296
    ../src/alu_mem.c:30: alu_block() want = 1328, size = 1360, given = 1312
    ../src/alu_mem.c:30: alu_block() want = 1344, size = 1376, given = 1328
    ../src/alu_mem.c:30: alu_block() want = 1360, size = 1392, given = 1344
    ../src/alu_mem.c:30: alu_block() want = 1376, size = 1408, given = 1360
    ../src/alu_mem.c:30: alu_block() want = 1392, size = 1424, given = 1376
    ../src/alu_mem.c:30: alu_block() want = 1408, size = 1440, given = 1392
    ../src/alu_mem.c:30: alu_block() want = 1424, size = 1456, given = 1408
    ../src/alu_mem.c:30: alu_block() want = 1440, size = 1472, given = 1424
    ../src/alu_mem.c:30: alu_block() want = 1456, size = 1488, given = 1440
    ../src/alu_mem.c:30: alu_block() want = 1472, size = 1504, given = 1456
    ../src/alu_mem.c:30: alu_block() want = 1488, size = 1520, given = 1472
    ../src/alu_mem.c:30: alu_block() want = 1504, size = 1536, given = 1488
    ../src/alu_mem.c:30: alu_block() want = 1520, size = 1552, given = 1504
    ../src/alu_mem.c:30: alu_block() want = 1536, size = 1568, given = 1520
    ../src/alu_mem.c:30: alu_block() want = 1552, size = 1584, given = 1536
    ../src/alu_mem.c:30: alu_block() want = 1568, size = 1600, given = 1552
    ../src/alu_mem.c:30: alu_block() want = 1584, size = 1616, given = 1568
    ../src/alu_mem.c:30: alu_block() want = 1600, size = 1632, given = 1584
    ../src/alu_mem.c:30: alu_block() want = 1616, size = 1648, given = 1600
    ../src/alu_mem.c:30: alu_block() want = 1632, size = 1664, given = 1616
    ../src/alu_mem.c:30: alu_block() want = 1648, size = 1680, given = 1632
    ../src/alu_mem.c:30: alu_block() want = 1664, size = 1696, given = 1648
    ../src/alu_mem.c:30: alu_block() want = 1680, size = 1712, given = 1664
    ../src/alu_mem.c:30: alu_block() want = 1696, size = 1728, given = 1680
    ../src/alu_mem.c:30: alu_block() want = 1712, size = 1744, given = 1696
    ../src/alu_mem.c:30: alu_block() want = 1728, size = 1760, given = 1712
    ../src/alu_mem.c:30: alu_block() want = 1744, size = 1776, given = 1728
    ../src/alu_mem.c:30: alu_block() want = 1760, size = 1792, given = 1744
    ../src/alu_mem.c:30: alu_block() want = 1776, size = 1808, given = 1760
    ../src/alu_mem.c:30: alu_block() want = 1792, size = 1824, given = 1776
    ../src/alu_mem.c:30: alu_block() want = 1808, size = 1840, given = 1792
    ../src/alu_mem.c:30: alu_block() want = 1824, size = 1856, given = 1808
    ../src/alu_mem.c:30: alu_block() want = 1840, size = 1872, given = 1824
    ../src/alu_mem.c:30: alu_block() want = 1856, size = 1888, given = 1840
    ../src/alu_mem.c:30: alu_block() want = 1872, size = 1904, given = 1856
    ../src/alu_mem.c:30: alu_block() want = 1888, size = 1920, given = 1872
    ../src/alu_mem.c:30: alu_block() want = 1904, size = 1936, given = 1888
    ../src/alu_mem.c:30: alu_block() want = 1920, size = 1952, given = 1904
    ../src/alu_mem.c:30: alu_block() want = 1936, size = 1968, given = 1920
    ../src/alu_mem.c:30: alu_block() want = 1952, size = 1984, given = 1936
    ../src/alu_mem.c:30: alu_block() want = 1968, size = 2000, given = 1952
    ../src/alu_mem.c:30: alu_block() want = 1984, size = 2016, given = 1968
    ../src/alu_mem.c:30: alu_block() want = 2000, size = 2032, given = 1984
    ../src/alu_mem.c:30: alu_block() want = 2016, size = 2048, given = 2000
    ../src/alu_mem.c:30: alu_block() want = 2032, size = 2064, given = 2016
    check_alu_d.AppImage: malloc.c:2394: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
    make[2]: *** [main.mak:99: check] Aborted (core dumped)
    rm ../src/alu_int_d.o ../src/alu_math_d.o ../tests/check_alu_d.o ../src/alu_main_d.o ../src/alu_uint_d.o ../src/alu_fpn_d.o ../src/alu_bit_d.o ../src/alu_mem_d.o ../src/alu_vec_d.o
    make[1]: *** [makefile:10: rebuild] Error 2
    make: *** [makefile:4: rebuild] Error 2
    Compilation failed.

  9. #114
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    - count is cleared for the sake of avoiding an additional branch (which slows code normally - read up on how CPUs treat branch instructions to understand what I mean), as you'll note shortly after the loop count is overwritten with the next expected used registers count, in the case an available register was found it will be the old count, in the case that no register was found then it will be the index (which itself is either the original count or an available register) + 1
    I know that branching is generally bad. In the recent past I have worked on CPU branch prediction H/W design... but I think you are overly pessimistic on how bad it is. The 90/10 rule apply - you only have to optimise 10% of the code to get 90% of the benefit. When it comes to code optimisation objective timings win over dogma...

    - reg is only used to hold the value of index before it is incremented, it is given back in the case count was cleared, after that the variable has no particular purpose so I reused it for holding the total registers to allocate (which is usually the existing amount, only higher if count is higher)
    Reusing a variable just because it is there lowers the readability of the code and may also hinders the compilers ability to do optimizations or print helpful messages like "Opps - you are not initialising this variable before use".

    - Using !alu for branching is bad because at every branch the CPU preloads instructions with the assumption that the result will be true, when false it has to throw away that preload and load the correct instructions, this is why I now place code for false results after the if statement (whether it is in an else statement or otherwise)
    This is mostly false on modern CPUs. The instruction fetch unit of modern CPUs uses branch prediction to and speculative execution to have the right instructions decoded and in flight most of the time. Modern CPUs recognise patterns, so it the last few conditional branches have a previously seen pattern it will expect the pattern to continue.
    This allows the cpu to predict things like
    Code:
    for(i = 0; i < 4; i++) {....}
    and not miss-predict during the last pass of the loop.

    It's a really big topic, the Wikipedia page Branch predictor - Wikipedia only scratches the surface.

    Also, compilers can also move around blocks of code, and even duplicate them (i.e. unrolling loops), which mean that the relationships you see in the source may not be present in the generated instruction stream.

    - assigning a value to alu_used(alu) is fine when the "function" is nothing more than a macro to the value that can be overwritten, this can also happen with errno last I checked so it is considered acceptable in the right situations
    Sure it is legal, but is quirky, You have to be aware that "alu_used(alu)" is macro, so "alu_used(alu) = whatever;" could just written "alu->taken = whatever;"

    - Your rewrite has too many branches and would inevitably slow down the ALU so I won't be using it
    Although I don't agree, I'm fine with that. One can only offer advice, not force it to be taken....

  10. #115
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    As an example of false optimization, take HIGHEST(a,b) on my x64 platform.

    Here's your version:
    Code:
    #define HIGHEST( A, B ) ( ((A) * ((A) >= (B))) | ((B) * ((B) >= (A))) )
    It compiles down to this instruction stream:

    Code:
    	xorl	%edx, %edx
    	cmpl	%ebx, %eax
    	setle	%dl
    	xorl	%ecx, %ecx
    	imull	%ebx, %edx
    	cmpl	%ebx, %eax
    	setge	%cl
    but this version:

    Code:
    #define HIGHEST( A, B ) ( A > B ? A : B)
    Compiles down to the (still branchless) but much quicker:
    Code:
            cmpl    %eax, %ebx
            cmovg   %ebx, %eax

  11. #116
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by hamster_nz View Post
    I know that branching is generally bad. In the recent past I have worked on CPU branch prediction H/W design... but I think you are overly pessimistic on how bad it is. The 90/10 rule apply - you only have to optimise 10% of the code to get 90% of the benefit. When it comes to code optimisation objective timings win over dogma...



    Reusing a variable just because it is there lowers the readability of the code and may also hinders the compilers ability to do optimizations or print helpful messages like "Opps - you are not initialising this variable before use".



    This is mostly false on modern CPUs. The instruction fetch unit of modern CPUs uses branch prediction to and speculative execution to have the right instructions decoded and in flight most of the time. Modern CPUs recognise patterns, so it the last few conditional branches have a previously seen pattern it will expect the pattern to continue.
    This allows the cpu to predict things like
    Code:
    for(i = 0; i < 4; i++) {....}
    and not miss-predict during the last pass of the loop.

    It's a really big topic, the Wikipedia page Branch predictor - Wikipedia only scratches the surface.

    Also, compilers can also move around blocks of code, and even duplicate them (i.e. unrolling loops), which mean that the relationships you see in the source may not be present in the generated instruction stream.



    Sure it is legal, but is quirky, You have to be aware that "alu_used(alu)" is macro, so "alu_used(alu) = whatever;" could just written "alu->taken = whatever;"



    Although I don't agree, I'm fine with that. One can only offer advice, not force it to be taken....
    It's not about being pessimistic, it's just I consider using slower code to be a fore-par when you know there is faster code you can use, ideally I would do this all in ASM and eliminate branching altogether by utilizing some relative address loading instruction and the multiply instruction (or shift if no such instruction is available) but my ASM skill/knowledge are poor at the moment so I can't.

  12. #117
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Managed to resolve my memory issues (not the ones in my head though ) by using mmap instead, aside from needing a windows equivalent - which I think was call VirtualMemoryAlloc() - is there any potential problems anyone sees in the code here:
    Apparently realloc cannot be relied upon for acquiring available memory, * awsdert/alu@6a204dc * GitHub

  13. #118
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Quote Originally Posted by awsdert View Post
    Managed to resolve my memory issues (not the ones in my head though ) by using mmap instead, aside from needing a windows equivalent - which I think was call VirtualMemoryAlloc() - is there any potential problems anyone sees in the code here:
    Apparently realloc cannot be relied upon for acquiring available memory, * awsdert/alu@6a204dc * GitHub
    You have hidden the bug, not fixed it. Nine out of ten chance that it will come back to bite you with random corruptions and inexplicable issues

    Realloc() does work, but only as long as you don't clobber the heap by writing to memory outside of your allocations

  14. #119
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by hamster_nz View Post
    You have hidden the bug, not fixed it. Nine out of ten chance that it will come back to bite you with random corruptions and inexplicable issues

    Realloc() does work, but only as long as you don't clobber the heap by writing to memory outside of your allocations
    No it doesn't fully work, I had tracked the overwrite down to a situation where realloc had returned a valid pointer but never actually acquired the needed memory, leaving my code to believe it succeeded and carry on as though it had done so, where I had check the return code I switched to checking the memory given against the expected memory and found after a dozen or so realloc's the memory given turned out to be smaller than expected but because I had checked the return code instead of the given bytes I ran into a memory clobber situation, after using mmap() the situation just vanished, and frankly since the ALU will need to be optimised anyways mmap() is the better choice anyways, at least there I can skip assigning memory for the parameters that shouldn't have been appended but prepended to the memory given

  15. #120
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by awsdert
    No it doesn't fully work, I had tracked the overwrite down to a situation where realloc had returned a valid pointer but never actually acquired the needed memory, leaving my code to believe it succeeded and carry on as though it had done so
    Are you sure you don't have a bug resulting in undefined behaviour elsewhere?

    Having said that, yes, I believe it is true that on Linux malloc and presumably also realloc is not guaranteed to return a null pointer on allocation failure.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. can anyone help to spot mistake in the code
    By chess_queen in forum C Programming
    Replies: 1
    Last Post: 10-21-2012, 10:37 AM
  2. Can you spot my mistake?
    By Brewer in forum C Programming
    Replies: 13
    Last Post: 11-12-2006, 12:50 PM
  3. going to a certain spot in a file...
    By agerealm in forum C++ Programming
    Replies: 3
    Last Post: 05-17-2002, 02:31 AM

Tags for this Thread