Using malloc instead of calloc

**sangamesh** · 04-24-2012

Hi,

How to get the functionality/features of calloc with malloc function. i.e. is there additional function available in C, that allows:

calloc = malloc + additional function

Description: The objective is to port(Parallelize) the C program package from CPU to NVIDIA GPU CUDA Architecture. The calloc function, used in the program can't be used on the CUDA kernels. Is there a way-out here instead of initializing structure parameters manually to zero?

Thanks

**grumpy** · 04-24-2012

The call

Code:

   x = calloc(num, size);

is functionally equivalent to

Code:

   x = malloc(sum*size);
   memset(x, 0, num*size);

That memset() call sets the characters in x to be zero (the 0 is interpreted as an unsigned char). You can probably guess the approximate CUDA equivalent

Whether you deem the memset() call as setting structure parameters manually to zero, or as something distinct, is a matter of perspective. It has the same effect as setting all structure parameters, and any padding within the struct, to zero.

**kitsune3233** · 04-24-2012

Actually, there's a very big problem with malloc() that you won't see with calloc(), though the problem isn't really a problem if you're not making your structs unusually large and you're not allocating millions of them in one call. If nitems*sizeof(item_type) overflows, malloc() could happily allocate the resulting amount and return a valid pointer, and you could end up corrupting the heap when you start writing to what you think is valid memory. This generally isn't a problem, of course, so you usually don't even worry about it. calloc() most probably checks for an overflow and returns NULL if an overflow would indeed occur. Otherwise it would effectively do as grumpy stated.

Note: C++ programmers should be immune to this issue because they should be using the new[] operator rather than the C memory allocation functions. Otherwise they would have the same problem.

More info at Dev Shed Forums - View Single Post - Scorpy's Puzzle of the Week 2007-07-18 (it was originally a programming puzzle)

**ledow** · 04-24-2012

To be honest, I don't think you can rely on that feature of calloc either. Just because some implementations check it, I don't think it means that all of them do. With either you should be ensuring overflow doesn't happen, so it's no different.

"most probably" is not something to rely on when playing with memory. And "most probably" any 32-bit C compiler is using unsigned int for size_t, which gives you 4Gb of data in a single malloc call before it overflows. If your program is "accidentally" allocating more than 4Gb in a single call and you don't know about it, you have bigger problems than malloc overflowing - especially on 32-bit-only architectures.

Hell, a malloc+memset or calloc of 4Gb+ in one hit is going to really take quite a long time, even on a modern machine.

**kitsune3233** · 04-24-2012

Originally Posted by ledow

To be honest, I don't think you can rely on that feature of calloc either. Just because some implementations check it, I don't think it means that all of them do. With either you should be ensuring overflow doesn't happen, so it's no different.

"most probably" is not something to rely on when playing with memory.

Exactly. And if you don't know your compiler and system well enough to determine whether you should check it, chances are that you should probably be checking anyway.

**claudiu** · 04-24-2012

Originally Posted by kitsune3233

Exactly. And if you don't know your compiler and system well enough to determine whether you should check it, chances are that you should probably be checking anyway.

You should always check everything and never assume anything.

If programming teaches you one thing is to be an overly suspicious freak. A direct effect of that is that programmers are harder to fool in real life than the rest of society.

**grumpy** · 04-24-2012

That "very big problem" is not always an issue, kitsune: it significance depends on the relationship between the size_t type and the representation of pointers (both quantities defined by the compiler) and the relationship of both with the memory architecture of the underlying machine.

For example, if the host system is 32-bit (i.e. the machine can address 2^32 bytes (aka 4 GB)) then that is an upper limit on the size of addressable memory, in bytes (or char) that any program can allocate. If we have a compiler targeting that host system which supports both a pointer type and a size_t type that are each 32bits, then that means the maximum number of objects that that can be allocated is 2^32/sizeof(object). It is not possible to create more objects than can be held in the available address space (as distinct from available RAM). If you try, you either get modulo arithmetic in play (when computing num*size) or a failure to allocate the memory (eg with a calloc() call).

Where things potentially get trickier is if the host system can support much more memory than the compiler and other program that it hosts (for example, a 32-bit compiler that generates 32-bit programs to run on a 64-bit host system). Then you can potentially have cases where a malloc(num*size) is subject to an overflow in computing num*size, so a malloc(num*size) and calloc(num, size) can allocate different sized blocks of memory. One potential test of such a case would be sizeof(size_t) is not equal to sizeof(pointer) - bearing in mind that all pointer types are the same size in C (otherwise a pointer to X and a void pointer could not be implicitly converted to each other).

However, most hosted compilers seek to allow their programs to access all available memory. i.e. a native compiler on a 64-bit system will produce programs that work with a 64-bit address space, and the size_t would also be 64 bit. Which means the upper limit of num in the calloc() call decreases as the size argument increases.

In other words, if there is a problem with the malloc() call, there is going to be a problem with the calloc() call on most modern architectures, even if the two functions do things differently. To reiterate, it is not possible to create more objects than can be held in the available address space.

This would also affect C++ programmers (in using operator new versus operator new []).

The most common practical exception I can think of where the issue can occur is with virtual machines (which will execute the programs and, in turn, be executed on the host system) where there are different memory models between program and host.

All this is particularly academic with CUDA, since it doesn't directly even support a calloc() equivalent. The issue you describe might eventually emerge if nvidia starts routinely creating video cards with more than 4GB of VRAM. My guess (noting I don't have a crystal ball) is that they're more likely to focus on supporting more CUDA cores before they do that.

Thread: Using malloc instead of calloc

Thread Tools

Search Thread

Display

Using malloc instead of calloc

Similar Threads

malloc, calloc from the FAQ

Malloc And Calloc

Malloc vs. Calloc

calloc vs malloc

Calloc vs. Malloc