Thread: Don't understand why valgrind is giving an error here.

  1. #1
    Registered User
    Join Date
    May 2010
    Posts
    269

    Don't understand why valgrind is giving an error here.

    Hi,
    I've looked through the valgrind documentation, but I just can't see anything wrong with this line of code:

    Code:
        
    int* 
    get_replica_ids(const Replica *replicas, int num_replicas)
    {
        int *ids = (int*)malloc(sizeof(int) * num_replicas);
        if(!ids){
            fprintf(stderr, "Could not allocate memory for id array.\n");
            return NULL;
        }
    
        int i;
        for(i = 0; i<num_replicas; i++){    
            ids[i] = replicas[i].replica_id;
        }
        return ids;
    }
    it's giving an error the if(!ids) line.

    The error is

    Address 0x6a43f38 is 0 bytes after a block of size 8 alloc'd
    Any idea?

  2. #2
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I would guess memory corruption.
    Are you freeing the pointer returned from this function appropriately?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  3. #3
    Registered User ledow's Avatar
    Join Date
    Dec 2011
    Posts
    435
    I would agree. The memory is allocated at the line just about it - but that doesn't mean it doesn't live on without ever being freed. Valgrind just knows where it came from, not what happened to it. The "block of size 8" would suggest that the pointer returned from this function is not freed when "sizeof(int) * num_replicas" = 8, if that's any help. Depending on your sizeof(int), it will tell you what num_replicas was (probably 2 or 1 depending on 32/64 bit) when the block was allocated, which will help you work out what happened to the pointer from this function.

    - Compiler warnings are like "Bridge Out Ahead" warnings. DON'T just ignore them.
    - A compiler error is something SO stupid that the compiler genuinely can't carry on with its job. A compiler warning is the compiler saying "Well, that's bloody stupid but if you WANT to ignore me..." and carrying on.
    - The best debugging tool in the world is a bunch of printf()'s for everything important around the bits you think might be wrong.

  4. #4
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    A value of num_replicas being zero will possibly cause that complaint from valgrind. The solution is to call the function with a non-zero value for num_replicas.

    Also try removing the type conversion (the "(int *)"). It is not required, and can obscure some programming errors that would cause valgrind to report a problem. If your code does not compile when you do that, then you have found a probable cause of your problem - which would be a missing #include <stdlib.h>
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  5. #5
    Registered User
    Join Date
    May 2010
    Posts
    269
    Quote Originally Posted by grumpy View Post
    A value of num_replicas being zero will possibly cause that complaint from valgrind. The solution is to call the function with a non-zero value for num_replicas.

    Also try removing the type conversion (the "(int *)"). It is not required, and can obscure some programming errors that would cause valgrind to report a problem. If your code does not compile when you do that, then you have found a probable cause of your problem - which would be a missing #include <stdlib.h>
    stdlib.h is included. It still gives the error "error: a value of type "void *" cannot be used to initialize an entity of type "int *"" when (int*) is removed. It's not a standard gcc compiler; it's a portland group compiler running on compute node linux.

    the number of replicas in this simple test is 2. i suppose i could put some checks around that, but if it's 0, something major has happened and it's better that it crashes anyways.
    Last edited by dayalsoap; 03-01-2013 at 10:41 PM.

  6. #6
    Registered User
    Join Date
    May 2010
    Posts
    269
    Quote Originally Posted by ledow View Post
    I would agree. The memory is allocated at the line just about it - but that doesn't mean it doesn't live on without ever being freed. Valgrind just knows where it came from, not what happened to it. The "block of size 8" would suggest that the pointer returned from this function is not freed when "sizeof(int) * num_replicas" = 8, if that's any help. Depending on your sizeof(int), it will tell you what num_replicas was (probably 2 or 1 depending on 32/64 bit) when the block was allocated, which will help you work out what happened to the pointer from this function.
    It's generated one at startup and is messaged around accordingly. It's not freed as it lives for the duration of the job.

  7. #7
    Registered User
    Join Date
    May 2010
    Posts
    269
    Quote Originally Posted by iMalc View Post
    I would guess memory corruption.
    Are you freeing the pointer returned from this function appropriately?
    It's not freed as it's generated once at startup and is kept for the remainder of the job and is shipped around as needed.

  8. #8
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Well, as far as I'm concerned, you're on your own now.

    You are using a compiler that is apparently not compliant with the C standard - information that is vital if you are seeking help in a C forum to avoid wasting people's time - but you didn't bother to offer that information.

    You have provided no information about the context in which the function is called. But that context is almost certainly a contributor to your problem.


    I suppose some people might enjoy playing "blind man's bluff" with you. I'm not one of them. Good luck in solving your problem, but I'm not providing any more help.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > It still gives the error "error: a value of type "void *" cannot be used to initialize an entity of type "int *"" when (int*) is removed. It's not a standard gcc compiler;
    No, complaining about void* to int* conversions is C++'s forte.

    > Address 0x6a43f38 is 0 bytes after a block of size 8 alloc'd
    I supposed we're supposed to take it on faith that this really is the block you allocated, and it's not say an array overrun on the array you're copying from.

    > ids[i] = replicas[i].replica_id;
    Is replica_id an int?
    Are there any attributes on the replicas struct, such as packing.

    In short, post
    a) a whole program which demonstrates the problem - all you need is the struct, a 1-line main and this function.
    b) your complete valgrind report.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #10
    Registered User
    Join Date
    May 2010
    Posts
    269
    Quote Originally Posted by grumpy View Post
    Well, as far as I'm concerned, you're on your own now.

    You are using a compiler that is apparently not compliant with the C standard - information that is vital if you are seeking help in a C forum to avoid wasting people's time - but you didn't bother to offer that information.

    You have provided no information about the context in which the function is called. But that context is almost certainly a contributor to your problem.


    I suppose some people might enjoy playing "blind man's bluff" with you. I'm not one of them. Good luck in solving your problem, but I'm not providing any more help.
    Actually, it *is* standard compliant. It's actually the defacto for any mainstream HPC work. I'm sorry you tried to be pedantic, but it bit you in the face because you really aren't an expert. Sorry.

  11. #11
    Registered User
    Join Date
    May 2010
    Posts
    269
    I supposed we're supposed to take it on faith that this really is the block you allocated, and it's not say an array overrun on the array you're copying from.
    Could be. Valgrind listed that specific line as the problem, though.

    Is replica_id an int?
    Yes it is.

    your complete valgrind report.
    Not sure if it will help you, as it's an MPI job running on about 50k cores. Having the complete valgrind report wouldn't be helpful, I don't think. If you'd want it, I could send attach it.

  12. #12
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by dayalsoap View Post
    Actually, it *is* standard compliant. It's actually the defacto for any mainstream HPC work. I'm sorry you tried to be pedantic, but it bit you in the face because you really aren't an expert. Sorry.
    Arrogant ........, aren't you. Welcome to my ignore list.

    If giving advice about standard C in a C forum is being pedantic - when you didn't bother to provide relevant information to avoid wasting the time of others who might try to help you - then so be it.

    A "defacto" for HPC work is not the same as standard.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > Not sure if it will help you, as it's an MPI job running on about 50k cores. Having the complete valgrind report wouldn't be helpful, I don't think.
    I was kinda hoping for just a simple test case (you know main + function + minimal bits to make it work), and the valgrind report from that, which should have been just a handful of lines.
    You know, something WE could just download, compile and test for ourselves on a range of different platforms and compilers, to try and figure out where the issue really is.

    But if you've got dozens of pages of valgrind issues, don't you have better ones to look at than "0 bytes past the end" which only happens once at startup?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #14
    Registered User
    Join Date
    May 2010
    Posts
    269
    Quote Originally Posted by Salem View Post
    > Not sure if it will help you, as it's an MPI job running on about 50k cores. Having the complete valgrind report wouldn't be helpful, I don't think.
    I was kinda hoping for just a simple test case (you know main + function + minimal bits to make it work), and the valgrind report from that, which should have been just a handful of lines.
    You know, something WE could just download, compile and test for ourselves on a range of different platforms and compilers, to try and figure out where the issue really is.

    But if you've got dozens of pages of valgrind issues, don't you have better ones to look at than "0 bytes past the end" which only happens once at startup?
    Well the problem I guess is the way the Replica objects are created. It's a feature of the Open Community Runtime. I'm not sure if I'd feasibly be able to create Replica instances without it.

    I was hoping the problem would have been simple to fix, but I guess with OCR + MPI + Gemini messaging layers, there's a lot that can go wrong.. i.e., it's probably not just a stupid syntax error on my part.

    Thanks for your help, anyways.

  15. #15
    Registered User
    Join Date
    May 2010
    Posts
    269
    Quote Originally Posted by grumpy View Post
    Arrogant ........, aren't you. Welcome to my ignore list.

    If giving advice about standard C in a C forum is being pedantic - when you didn't bother to provide relevant information to avoid wasting the time of others who might try to help you - then so be it.

    A "defacto" for HPC work is not the same as standard.
    The problem is you assume there's only one single standard. YOu live in your Webpage + Database world where gcc is king. That's fine and good, but don't try to be pedantic just for the sake of hearing yourself talk.

    Your little pedantic suggestion won't even touch the problem anyways.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Valgrind error help
    By edishuman in forum C Programming
    Replies: 1
    Last Post: 11-12-2011, 03:37 PM
  2. Memory Allocation Error - /w Valgrind Output
    By Ggregagnew in forum C Programming
    Replies: 5
    Last Post: 12-02-2010, 04:02 PM
  3. valgrind error - still reachable
    By myle in forum C Programming
    Replies: 1
    Last Post: 04-19-2009, 08:57 PM
  4. GCC compiler giving syntax error before 'double' error
    By dragonmint in forum Linux Programming
    Replies: 4
    Last Post: 06-02-2007, 05:38 PM
  5. WSAGetLastError() not giving error
    By Hunter2 in forum Networking/Device Communication
    Replies: 1
    Last Post: 08-04-2003, 01:03 PM