I would suggest you write a simple program and then try and gprof that as well.
Eg.
Code:
void copy1 ( char *dest, const char *src ) {
for ( i = 0 ; i <= strlen(src) ; i++ ) dest[i] = src[i];
}
void copy2 ( char *dest, const char *src ) {
int len = strlen(src);
for ( i = 0 ; i <= len ; i++ ) dest[i] = src[i];
}
void copy3 ( char *dest, const char *src ) {
while ( (dest[i] = src[i]) != 0 ) i++;
}
gprof gives you a lot of info, so it takes time to learn how to use that data effectively.
What you're basically looking for are counts of function invocations which are much larger than the size of the input problem, or times which seem much larger than the amount of work being done.
Oh, and one more thing.
You'll need an effective test suite before you start messing about with the code trying to make it quicker.
After each "improvement", you need a way to be able to quickly tell whether it still WORKS, regardless of whether it is any quicker or not.