# Thread: How efficient is this code?

1. ## How efficient is this code?

Below is a function which takes a string and an integer as it's parameters, and returns a string. It's purpose is to remove the character at the position denoted by the integer, and return the remaining characters in the form of another string.

Code:
```char *
extract_letter( char *word, int letter )
{
int ctr;
char *str;

str = (char *) malloc( strlen( word ) - 1);

for( ctr = 0 ; ctr < letter ; ctr++ )
str[ctr] = word[ctr];

for( ctr = letter ; ctr < strlen( word ) ; ctr++ )
str[ctr] = word[ctr+1];

return( str );
}```
The calling function ensures that the value of (int letter) always relates to a valid position in (char *word).

The problem I've got is that this function is being called many many times during the excution of the program (potentially hundreds of millions of times), and I need to make sure that it is as fast as possible. Can anyone suggest a way of improving the speed of this function?

Also, I'm just guessing that this function might be a bottleneck, as I don't know how to work out what percentage of time the program takes to execute each part. Does anyone know of a way to measure this?

Thanks.

2. >Can anyone suggest a way of improving the speed of this function?
Don't bother until you're sure that there's a performance problem. Optimizing this function would probably result in it being harder to understand. However, one quick and easy way to make this function faster while also making it safer is to avoid dynamic allocation by passing in a buffer for the new word:
Code:
```char* extractLetter(char* word, char* new_word, size_t index)
{
size_t i;

for (i = 0; i < index; i++)
new_word[i] = word[i];

for (i = index; word[i] != '\0'; i++)
new_word[i] = word[i + 1];

return new_word;
}```
This way you don't have the expensive call to malloc, you don't call strlen with every iteration of the second loop, and you don't have to worry about releasing the memory later in the program.

>I'm just guessing that this function might be a bottleneck
It's been repeatedly shown that programmers are notoriously bad at guessing where bottlenecks are.

>Does anyone know of a way to measure this?
The best way is to use a profiler to get percentages during execution.

>str = (char *) malloc( strlen( word ) - 1);
You don't have to cast malloc in C, it's also considered a bad thing because it can hide the error of not including stdlib.h. Also, since the size of word is strlen(word) + 1, the correct size after removing a single character would be strlen(word), as it is you have an off-by-one error that may or may not cause problems later. Probably when you call free.

>for( ctr = letter ; ctr < strlen( word ) ; ctr++ )
It's not a good idea to include function calls as counting loop conditions if you can avoid it easily. This loop would be slowed considerably by the repeated overhead of calling strlen and the time it takes to find the length of word. A better way would be to either work with nul characters as strlen probably does, or call strlen and save the value:
Code:
```int len = strlen(word);

...

for( ctr = letter ; ctr < len ; ctr++ )```

3. Thanks for the advice - but... a couple more questions.

Is there any reason that you chose to use size_t as opposed to int?

...and...

If the function takes (char *new_word) as a parameter, do you still need to (return new_word? Wouldn't it be the same to declare the function as void, and just alter the value of *new_word? Also, are you implying that the malloc for the new_word is placed in the calling function, or is there some way of avoiding the malloc entirely, even though the length of (char *word) is not known until run-time?

By the way, could you recommend a profiler for use on Solaris?

Cheers,
Andy

4. >Is there any reason that you chose to use size_t as opposed to int?
The function doesn't work with negative indices and size_t is to be preferred over int for array indicies if you can manage it.

> do you still need to (return new_word)?
No, but it makes setting a pointer to the beginning of the array slightly simpler. You can do
Code:
```char* p;

...

p = extractLetter(word, new_word, i);```
Code:
```char* p;

...

extractLetter(word, new_word, i);
p = new_word;```
It can also be used for error checking. If something goes wrong you can return NULL. You'll notice that fgets does the same thing; the buffer doesn't need to be returned, but doing so adds a lot of useful flexibility to the function.

>Also, are you implying that the malloc for the new_word is placed
>in the calling function, or is there some way of avoiding the
>malloc entirely, even though the length of (char *word) is not known until run-time?
If the length of word isn't known until run-time and you can't determine a safe upper limit, you'll need to use malloc. However, by placing the onus of allocating memory on the calling function you can implement useful optimizations such as only calling a memory allocation routine if the current size of word is longer than the new_word buffer. This wasn't possible with your previous function.

>By the way, could you recommend a profiler for use on Solaris?
Take a look at prof in your man pages.

5. I've just run prof against the program, and found that over 50% of the run-time is taken up by a process called _brk_unlocked. The 'largest' user function only takes 10% of run-time.

I've tried searching google for _brk_unlocked, but not found much of use. I gather it's something to do with memory allocation for variables, but can anybody shed any more light? ...and is there anything I can do to improve it?

Thanks.

6. I followed the advice from Brighteyes, and also realised that I did know the upper limit of the new_word variable, so I've actually removed all explicit calls to malloc(). Here is the adjusted code...

Code:
```#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "permutations.h"

void
permute( char *input,
char *pool )
{
char output[MAXLENGTH] = "";
char new_word[MAXLENGTH] = "";
int ctr, target, pool_len;

strcat( output, input );
target = strlen( output );
pool_len = strlen( pool );

for( ctr = 0 ; ctr < pool_len ; ctr++ )
{
output[target] = pool[ctr];

append_perm( output );
permute( output, extract_letter( pool, new_word, pool_len, ctr ) );
}
}

char *
extract_letter( char *word, char *new_word, int word_len, int letter )
{
int ctr;

for( ctr = 0 ; ctr < letter ; ctr++ )
new_word[ctr] = word[ctr];

for( ctr = letter ; ctr < word_len ; ctr++ )
new_word[ctr] = word[ctr+1];

return( new_word );
}```
It is called from a main in another program file...

Code:
```#include <stdio.h>
#include "permutations.h"

struct node *tail_perm = NULL;

int
main( int argc, char *argv[] )
{
if( argc != 2 )
{
printf( "\nUsage : andy <string>" );
return( 0 );
}

permute( "", argv[1] );
printf( "\n\n" );

return( 0 );
}```
...and finally here is the top of the output from prof
Code:
``` %Time Seconds Cumsecs  #Calls   msec/call  Name
48.5   28.36   28.36                      _brk_unlocked
13.0    7.57   35.9379144928      0.0001  _mcount
11.7    6.82   42.75 9864101      0.0007  permute
4.6    2.70   45.45 9864100      0.0003  extract_letter
4.0    2.32   47.77 9864100      0.0002  malloc
3.7    2.19   49.9619728202      0.0001  strlen
3.7    2.17   52.13                      realloc
3.0    1.75   53.88 9864100      0.0002  strcpy
2.5    1.44   55.32 9864100      0.0001  append_perm
2.3    1.33   56.65 9864101      0.0001  strcat
1.6    0.91   57.56                      _free_unlocked
0.9    0.53   58.09                      _fileno_unlocked
0.0    0.01   58.42  116050      0.0001  _sbrk_unlocked
0.0    0.01   58.43  116050      0.0001  _sbrk```
As you can see, nearly two thirds of the excution time is taken up by _brk_unlocked and _mcount.

I don't want to try and do anything complicated with the memory, I was just wondering if you could see anything daft that I'm doing which might be unnecessary.

Thanks,
Andy

7. Sorry, I had not included the code for the append_perm function, and it is that which contains the calls to malloc(). I recompiled the code with append_perm rem'd out, and nearly all of the calls to _brk_unlocked disappeared.

I have attached the code files - the intention is to generate a list of every permutation of a string supplied at the command line. I thought I had come up with quite a neat little recursive function to do this, but I'm struggling to keep the run-time down if the string length exceeds 10 characters.

If anyone is interested in trying to improve the code, please let me know your ideas. Also, do you know of a more efficient approach to permutation generation?