Anyone know why this would be reported?
Code:realloc(): invalid next size
Anyone know why this would be reported?
Code:realloc(): invalid next size
I can't get the current version pulled from GitHub to build.
I can't bench-check the functioning of your code as it feels too weird, preventing me from seeing what is going on.
That code feels weird to me for the following reason:
- The for loop assigns reg as the initial setup, even though it isn't used in the test expression.
- reg is assigned again in the loop body, to the same value, making the initial assignment not required.
- The value of count is modified in the for loop to cause it to exit, which is an unusual pattern. It also breaks my understanding of count, as it is no longer the count of registers that have been used.
- The assignments following the loop have a a function of count assigned to index, and then a function of index assigned to count. This is playing mental "follow the ball under the shell" game as I try to follow which variable means what at which point.
- The 'reg' variable isn't really needed and confuses things, seems to be a hangover from earlier versions. Maybe.
- Assigning a value into the result of the alu_used() function isn't C-like.
If I was to rewrite it, to match my understanding of what is going on I would do it like this:
I know it doesn't fit the style of the project, but I feel that the intent of the code is much clearer.Code:uint_t alu_get_reg_node( alu_t *alu, size_t Nsize ) { if (!alu ) { (void)alu_err_null_ptr( "alu" ); return 0; } // Work out size of register if(alu_Nsize(alu)) > Nsize) Nsize = alu_Nsize( alu ); // Hunt for the first unused reg uint_t index = ALU_REG_NEEDED; while(index < alu_used(alu) && alu_get_active( alu, index )) index++; // Try to set up that register - will fail if out of registers alu_errno(alu) = alu_setup_reg( alu, index, Nsize ); if ( alu_errno(alu) ) { alu_error( alu_errno(alu) ); return 0; } // Did we need to use a new register? if(index == alu_used(alu)) alu_used( alu ) = alu_used(alu)+1; } // Empty out the newly allocated register and mark it active memset( alu_data( alu, index ), 0, Nsize ); alu_set_active( alu, index ); return index; }
Last edited by hamster_nz; 10-01-2020 at 03:58 AM.
Short answer: Usually it means that you you have corrupted your heap by accessing more memory then you have allocated.
Long answer: Malloc allocates memory in chunks that are rounded up in size to be aligned to a particular address boundary.
When you call realloc() looks at the next block of memory to see if it can use it to satisfy the request. Malloc has discovered that the size field for that block has been corrupted.
I'll answer this one 1st, just a few minutes ago I found that it was because I was using the same size for reallocation, which apparently means realloc is not designed to just give back the old pointer and pretend it was never called in that situation (despite that being the obvious thing to do)
I would put money on you corrupting the heap. The corurption will only be visible when something calls realloc(), free(), realloc() or related functions:
Here's a small test program that will recreate a "realloc(): invalid next size" error. Note how it allocates 100 bytes, then clears 120.
And the output when it is run:Code:#include <malloc.h> #include <stdio.h> #include <memory.h> int main(int argc, char *argv[]) { char *ptr1; ptr1 = malloc(100); if(ptr1 == NULL) { printf("failled to malloc\n"); return 0; } // Corrupt the heap memset(ptr1,0,120); // Now try the realloc ptr1 = realloc(ptr1,120); return 0; }
These memory corruption issues are a complete pain to debug.Code:~/realloc_test$ ./realloc_test realloc(): invalid next size Aborted (core dumped)
On Linux, if you add a call to mcheck() (defined in mcheck.h) before you use any of the malloc functions you will get better debug infor:
Code:hamster@hamster-envy:~/realloc_test$ cat realloc_test.c #include <malloc.h> #include <stdio.h> #include <mcheck.h> #include <memory.h> int main(int argc, char *argv[]) { char *ptr1; mcheck(NULL); ptr1 = malloc(100); if(ptr1 == NULL) { printf("failled to malloc\n"); return 0; } // Corrupt the heap memset(ptr1,0,120); // Now try the realloc ptr1 = realloc(ptr1,120); return 0; } hamster@hamster-envy:~/realloc_test$ ./realloc_test memory clobbered past end of allocated block Aborted (core dumped)
For reader's reference hamster_nz is talking about this code:
- count is cleared for the sake of avoiding an additional branch (which slows code normally - read up on how CPUs treat branch instructions to understand what I mean), as you'll note shortly after the loop count is overwritten with the next expected used registers count, in the case an available register was found it will be the old count, in the case that no register was found then it will be the index (which itself is either the original count or an available register) + 1Code:uint_t alu_get_reg_node( alu_t *alu, size_t Nsize ) { uint_t count = 0, index, reg; if ( alu ) { index = ALU_REG_ID_NEED; count = alu_used( alu ); for ( reg = index; index < count; ++index ) { reg = index; count = SET1IF( alu_get_active( alu, index ), count ); } index = HIGHEST( count, reg ); count = HIGHEST( index + 1, alu_used(alu) ); Nsize = HIGHEST( Nsize, alu_Nsize( alu ) ); reg = HIGHEST( count, alu_upto( alu ) ); alu_errno(alu) = alu_setup_reg( alu, reg, Nsize ); if ( alu_errno(alu) ) { alu_error( alu_errno(alu) ); return 0; } alu->taken = count; (void)memset( alu_data( alu, index ), 0, Nsize ); alu_set_active( alu, index ); return index; } (void)alu_err_null_ptr( "alu" ); return 0; }
- reg is only used to hold the value of index before it is incremented, it is given back in the case count was cleared, after that the variable has no particular purpose so I reused it for holding the total registers to allocate (which is usually the existing amount, only higher if count is higher)
- Using !alu for branching is bad because at every branch the CPU preloads instructions with the assumption that the result will be true, when false it has to throw away that preload and load the correct instructions, this is why I now place code for false results after the if statement (whether it is in an else statement or otherwise)
- assigning a value to alu_used(alu) is fine when the "function" is nothing more than a macro to the value that can be overwritten, this can also happen with errno last I checked so it is considered acceptable in the right situations
- Your rewrite has too many branches and would inevitably slow down the ALU so I won't be using it
Here's that same code, where it doesn't smash the heap.
It prints "success" as expected, even after realloc()ing to the same size.Code:#include <malloc.h> #include <stdio.h> #include <mcheck.h> #include <memory.h> int main(int argc, char *argv[]) { char *ptr1; mcheck(NULL); ptr1 = malloc(120); if(ptr1 == NULL) { printf("failled to malloc\n"); return 0; } // Don't corrupt the heap memset(ptr1,0,120); // Now try the realloc ptr1 = realloc(ptr1,120); printf("Success\n"); free(ptr1); return 0; }
Going back to my problem with realloc(), while adjusting the requested size handling did allow it to get further (and even create a speedup for alu_block()) it still has some issue crop up, here's the output before the test came to a screeching halt.
Code:make rebuild check cd ../ && make --no-print-directory rebuild check #MAKECMDGOALS=rebuild check cd mak && make -j 1 --no-print-directory -f main.mak rebuild check PRJ_LIB_NAME=alu_d Checking 3rd Party libraries are upto date cd '../cloned/unic' && git fetch && git pull Finished checking PRJ_DST_BIN=check_alu_d.AppImage PRJ_DST_LIB=libalu_d.so rm -f ../bin/*.AppImage ../bin/*.exe rm -f ../lib/*.so ../lib/*.dll rm -f ../src/*.o ../src/*.obj rm -f ../tests/*.o ../tests/*.obj cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../src/alu_bit_d.o -c ../src/alu_bit.c cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../src/alu_fpn_d.o -c ../src/alu_fpn.c cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../src/alu_int_d.o -c ../src/alu_int.c cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../src/alu_main_d.o -c ../src/alu_main.c cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../src/alu_math_d.o -c ../src/alu_math.c cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../src/alu_mem_d.o -c ../src/alu_mem.c cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../src/alu_uint_d.o -c ../src/alu_uint.c cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../src/alu_vec_d.o -c ../src/alu_vec.c cc -ggdb -D _DEBUG -O0 -fPIC -shared -o ../lib/libalu_d.so ../src/alu_bit_d.o ../src/alu_fpn_d.o ../src/alu_int_d.o ../src/alu_main_d.o ../src/alu_math_d.o ../src/alu_mem_d.o ../src/alu_uint_d.o ../src/alu_vec_d.o -Wl,-rpath,../lib cc -ggdb -D _DEBUG -O0 -fPIC -Wall -Wextra -pedantic -I ../cloned/unic/include -I ../include -D UNIC_FALLBACK -o ../tests/check_alu_d.o -c ../tests/check_alu.c In file included from ../tests/check_alu.c:181: ../tests/check_alu.c:491:13: warning: ‘test_alu_reg2str’ defined but not used [-Wunused-variable] 491 | START_TEST( test_alu_reg2str ) | ^~~~~~~~~~~~~~~~ ../tests/check_alu.c:356:13: warning: ‘test_alu_op3’ defined but not used [-Wunused-variable] 356 | START_TEST( test_alu_op3 ) | ^~~~~~~~~~~~ cc -ggdb -D _DEBUG -O0 -fPIE -L ../lib -o ../bin/check_alu_d.AppImage ../tests/check_alu_d.o -Wl,-rpath,../lib -l alu_d -l check ../bin/check_alu_d.AppImage ../tests/check_alu.c:864: main() 'Running unit tests under 'check' test suite' Running suite(s): ALU ../src/alu_mem.c:68: alu_block() want = 512, size = 544, given = 0 ../src/alu_mem.c:30: alu_block() want = 1040, size = 1072, given = 512 ../src/alu_mem.c:30: alu_block() want = 1056, size = 1088, given = 1040 ../src/alu_mem.c:30: alu_block() want = 1072, size = 1104, given = 1056 ../src/alu_mem.c:30: alu_block() want = 1088, size = 1120, given = 1072 ../src/alu_mem.c:30: alu_block() want = 1104, size = 1136, given = 1088 ../src/alu_mem.c:30: alu_block() want = 1120, size = 1152, given = 1104 ../src/alu_mem.c:30: alu_block() want = 1136, size = 1168, given = 1120 ../src/alu_mem.c:30: alu_block() want = 1152, size = 1184, given = 1136 ../src/alu_mem.c:30: alu_block() want = 1168, size = 1200, given = 1152 ../src/alu_mem.c:30: alu_block() want = 1184, size = 1216, given = 1168 ../src/alu_mem.c:30: alu_block() want = 1200, size = 1232, given = 1184 ../src/alu_mem.c:30: alu_block() want = 1216, size = 1248, given = 1200 ../src/alu_mem.c:30: alu_block() want = 1232, size = 1264, given = 1216 ../src/alu_mem.c:30: alu_block() want = 1248, size = 1280, given = 1232 ../src/alu_mem.c:30: alu_block() want = 1264, size = 1296, given = 1248 ../src/alu_mem.c:30: alu_block() want = 1280, size = 1312, given = 1264 ../src/alu_mem.c:30: alu_block() want = 1296, size = 1328, given = 1280 ../src/alu_mem.c:30: alu_block() want = 1312, size = 1344, given = 1296 ../src/alu_mem.c:30: alu_block() want = 1328, size = 1360, given = 1312 ../src/alu_mem.c:30: alu_block() want = 1344, size = 1376, given = 1328 ../src/alu_mem.c:30: alu_block() want = 1360, size = 1392, given = 1344 ../src/alu_mem.c:30: alu_block() want = 1376, size = 1408, given = 1360 ../src/alu_mem.c:30: alu_block() want = 1392, size = 1424, given = 1376 ../src/alu_mem.c:30: alu_block() want = 1408, size = 1440, given = 1392 ../src/alu_mem.c:30: alu_block() want = 1424, size = 1456, given = 1408 ../src/alu_mem.c:30: alu_block() want = 1440, size = 1472, given = 1424 ../src/alu_mem.c:30: alu_block() want = 1456, size = 1488, given = 1440 ../src/alu_mem.c:30: alu_block() want = 1472, size = 1504, given = 1456 ../src/alu_mem.c:30: alu_block() want = 1488, size = 1520, given = 1472 ../src/alu_mem.c:30: alu_block() want = 1504, size = 1536, given = 1488 ../src/alu_mem.c:30: alu_block() want = 1520, size = 1552, given = 1504 ../src/alu_mem.c:30: alu_block() want = 1536, size = 1568, given = 1520 ../src/alu_mem.c:30: alu_block() want = 1552, size = 1584, given = 1536 ../src/alu_mem.c:30: alu_block() want = 1568, size = 1600, given = 1552 ../src/alu_mem.c:30: alu_block() want = 1584, size = 1616, given = 1568 ../src/alu_mem.c:30: alu_block() want = 1600, size = 1632, given = 1584 ../src/alu_mem.c:30: alu_block() want = 1616, size = 1648, given = 1600 ../src/alu_mem.c:30: alu_block() want = 1632, size = 1664, given = 1616 ../src/alu_mem.c:30: alu_block() want = 1648, size = 1680, given = 1632 ../src/alu_mem.c:30: alu_block() want = 1664, size = 1696, given = 1648 ../src/alu_mem.c:30: alu_block() want = 1680, size = 1712, given = 1664 ../src/alu_mem.c:30: alu_block() want = 1696, size = 1728, given = 1680 ../src/alu_mem.c:30: alu_block() want = 1712, size = 1744, given = 1696 ../src/alu_mem.c:30: alu_block() want = 1728, size = 1760, given = 1712 ../src/alu_mem.c:30: alu_block() want = 1744, size = 1776, given = 1728 ../src/alu_mem.c:30: alu_block() want = 1760, size = 1792, given = 1744 ../src/alu_mem.c:30: alu_block() want = 1776, size = 1808, given = 1760 ../src/alu_mem.c:30: alu_block() want = 1792, size = 1824, given = 1776 ../src/alu_mem.c:30: alu_block() want = 1808, size = 1840, given = 1792 ../src/alu_mem.c:30: alu_block() want = 1824, size = 1856, given = 1808 ../src/alu_mem.c:30: alu_block() want = 1840, size = 1872, given = 1824 ../src/alu_mem.c:30: alu_block() want = 1856, size = 1888, given = 1840 ../src/alu_mem.c:30: alu_block() want = 1872, size = 1904, given = 1856 ../src/alu_mem.c:30: alu_block() want = 1888, size = 1920, given = 1872 ../src/alu_mem.c:30: alu_block() want = 1904, size = 1936, given = 1888 ../src/alu_mem.c:30: alu_block() want = 1920, size = 1952, given = 1904 ../src/alu_mem.c:30: alu_block() want = 1936, size = 1968, given = 1920 ../src/alu_mem.c:30: alu_block() want = 1952, size = 1984, given = 1936 ../src/alu_mem.c:30: alu_block() want = 1968, size = 2000, given = 1952 ../src/alu_mem.c:30: alu_block() want = 1984, size = 2016, given = 1968 ../src/alu_mem.c:30: alu_block() want = 2000, size = 2032, given = 1984 ../src/alu_mem.c:30: alu_block() want = 2016, size = 2048, given = 2000 ../src/alu_mem.c:30: alu_block() want = 2032, size = 2064, given = 2016 check_alu_d.AppImage: malloc.c:2394: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed. make[2]: *** [main.mak:99: check] Aborted (core dumped) rm ../src/alu_int_d.o ../src/alu_math_d.o ../tests/check_alu_d.o ../src/alu_main_d.o ../src/alu_uint_d.o ../src/alu_fpn_d.o ../src/alu_bit_d.o ../src/alu_mem_d.o ../src/alu_vec_d.o make[1]: *** [makefile:10: rebuild] Error 2 make: *** [makefile:4: rebuild] Error 2 Compilation failed.
I know that branching is generally bad. In the recent past I have worked on CPU branch prediction H/W design... but I think you are overly pessimistic on how bad it is. The 90/10 rule apply - you only have to optimise 10% of the code to get 90% of the benefit. When it comes to code optimisation objective timings win over dogma...- count is cleared for the sake of avoiding an additional branch (which slows code normally - read up on how CPUs treat branch instructions to understand what I mean), as you'll note shortly after the loop count is overwritten with the next expected used registers count, in the case an available register was found it will be the old count, in the case that no register was found then it will be the index (which itself is either the original count or an available register) + 1
Reusing a variable just because it is there lowers the readability of the code and may also hinders the compilers ability to do optimizations or print helpful messages like "Opps - you are not initialising this variable before use".- reg is only used to hold the value of index before it is incremented, it is given back in the case count was cleared, after that the variable has no particular purpose so I reused it for holding the total registers to allocate (which is usually the existing amount, only higher if count is higher)
This is mostly false on modern CPUs. The instruction fetch unit of modern CPUs uses branch prediction to and speculative execution to have the right instructions decoded and in flight most of the time. Modern CPUs recognise patterns, so it the last few conditional branches have a previously seen pattern it will expect the pattern to continue.- Using !alu for branching is bad because at every branch the CPU preloads instructions with the assumption that the result will be true, when false it has to throw away that preload and load the correct instructions, this is why I now place code for false results after the if statement (whether it is in an else statement or otherwise)
This allows the cpu to predict things likeand not miss-predict during the last pass of the loop.Code:for(i = 0; i < 4; i++) {....}
It's a really big topic, the Wikipedia page Branch predictor - Wikipedia only scratches the surface.
Also, compilers can also move around blocks of code, and even duplicate them (i.e. unrolling loops), which mean that the relationships you see in the source may not be present in the generated instruction stream.
Sure it is legal, but is quirky, You have to be aware that "alu_used(alu)" is macro, so "alu_used(alu) = whatever;" could just written "alu->taken = whatever;"- assigning a value to alu_used(alu) is fine when the "function" is nothing more than a macro to the value that can be overwritten, this can also happen with errno last I checked so it is considered acceptable in the right situations
Although I don't agree, I'm fine with that. One can only offer advice, not force it to be taken....- Your rewrite has too many branches and would inevitably slow down the ALU so I won't be using it
As an example of false optimization, take HIGHEST(a,b) on my x64 platform.
Here's your version:
It compiles down to this instruction stream:Code:#define HIGHEST( A, B ) ( ((A) * ((A) >= (B))) | ((B) * ((B) >= (A))) )
but this version:Code:xorl %edx, %edx cmpl %ebx, %eax setle %dl xorl %ecx, %ecx imull %ebx, %edx cmpl %ebx, %eax setge %cl
Compiles down to the (still branchless) but much quicker:Code:#define HIGHEST( A, B ) ( A > B ? A : B)
Code:cmpl %eax, %ebx cmovg %ebx, %eax
It's not about being pessimistic, it's just I consider using slower code to be a fore-par when you know there is faster code you can use, ideally I would do this all in ASM and eliminate branching altogether by utilizing some relative address loading instruction and the multiply instruction (or shift if no such instruction is available) but my ASM skill/knowledge are poor at the moment so I can't.
Managed to resolve my memory issues (not the ones in my head though ) by using mmap instead, aside from needing a windows equivalent - which I think was call VirtualMemoryAlloc() - is there any potential problems anyone sees in the code here:
Apparently realloc cannot be relied upon for acquiring available memory, * awsdert/alu@6a204dc * GitHub
No it doesn't fully work, I had tracked the overwrite down to a situation where realloc had returned a valid pointer but never actually acquired the needed memory, leaving my code to believe it succeeded and carry on as though it had done so, where I had check the return code I switched to checking the memory given against the expected memory and found after a dozen or so realloc's the memory given turned out to be smaller than expected but because I had checked the return code instead of the given bytes I ran into a memory clobber situation, after using mmap() the situation just vanished, and frankly since the ALU will need to be optimised anyways mmap() is the better choice anyways, at least there I can skip assigning memory for the parameters that shouldn't have been appended but prepended to the memory given
Are you sure you don't have a bug resulting in undefined behaviour elsewhere?Originally Posted by awsdert
Having said that, yes, I believe it is true that on Linux malloc and presumably also realloc is not guaranteed to return a null pointer on allocation failure.
Look up a C++ Reference and learn How To Ask Questions The Smart WayOriginally Posted by Bjarne Stroustrup (2000-10-14)