I thought C is faster than Java, it seems I was wrong

**claudiu** · 11-04-2010

Well it largely depends on your actual code. People on this forum tend to suggest nearly optimum solutions most of the time so I would just stick around and read plenty of new threads every day.

**Adak** · 11-04-2010

...or leave your pointers here, thx

I can't leave my pointers, because my program I'm working on, uses them.

Check out big O algorithmic efficiency. They've done a LOT of studying of various algorithms, and have found a way to catalog how efficient most algorithms are. Some of the most important idea's, right off the top of my head are:

1) Choose the best algorithm. You can never make a poor algorithm, work better than a good one, with the same effort. Not even close.

2) Work on optimizing only the functions where your profiler shows your program is spending a lot of time. Spending hours of time to gain a 1/10th of a second, is a huge waste of time. Find the bottlenecks in your program, and concentrate on those.

3) Locality of reference. With modern PC's, remember that data is like food:

In the cache - it's like you have it on the table, in front of you (or at least, in the refrigerator).

In memory - it's like you have to go to the local store and get it, first.

On the drive - it's like you have to drive across town to the big supermarket. (SD's are changing this a bit though).

When you have some data, in cache especially, have your program do EVERYTHING you need to do with that data, right now - if possible.

4) Try to avoid string manipulations. That's easily the slowest to process data type. Think *integers*, for speed. CPU's are highly optimized for them.

5) Bit operations has always been fast. With the 64 bit systems now in PC's, that speed and capability, has reached a good 2-4 X higher. For crucial parts of a bottleneck function, consider them.

6) You went to the planet Lazy2, and became infected with the worst case of lazy-itus. Now you've shrunk down and are able to cruise around inside your PC, like a character in TRON.

Thinking VERY lazily, how would you use the system to efficiently solve the problem?

7) Test. Everybody has idea's about optimizing, but PC's are built way different today, than 10 years ago. We have improved OS's on every major platform, and new peripherals like Static Drives, Flash drives, etc. Network speeds have changed, internet speeds have changed. Most of us, still have the same old idea's of optimizing a program's run time, that we had a decade ago.

Everyone knows that Bubble sort is laughably poor, but did you know that Insertion sort is substantially faster than Quicksort, Heap sort, or Merge sort (generally three of the best sorters around), on small arrays of 30 elements or less?

An indexed search will out search a binary search, any day. Wouldn't be worth it for a few thousand records to search through, but if you have heavy searching, and hundreds of thousands of records, well -- now you're talking!

Use a sorter that is optimized for the data your program will be working with. Sorting is needed in almost every program, at some point - choose a good one, and test it.

8) Try not to work with pointers to variables, when you can work with the variables, directly. Try to use look-up tables, instead of making a series of calculations, whenever possible.

9) Use arrays, not linked lists, unless you can gain some wicked advantage by using them, with your particular problem

10). Accuracy is always #1 priority. Beyond good algorithmic efficiency, keep simplicity and clarity as goals in your code. If you optimize it to the teeth, you'll be pulling out your hair in frustration later, trying to debug it.

**icoigo** · 11-04-2010

I change my C program codes as suggested, no memory allocation any more. I change my Java codes as well, no clone data.
However, Java program takes 2049 milli-seconds while C program takes 2969 milli-seconds. It seems still a lot slower than Java.

Here is my new C program codes:

Code:

long readyGo(int x, int y, char **grid_graph) {
	long count = 0;
	char old_val = *(grid_graph[y] + x);
	*(grid_graph[y] + x) = '1';
	if (y != 0) {
		if (*(grid_graph[y-1] + x) == '3') {
			if (strchr(grid_graph[0], '0') == NULL) {
				*(grid_graph[y] + x) = old_val;
				return 1;
			}
		}
		if (*(grid_graph[y-1] + x) == '0') {
			count += readyGo(x, y - 1, grid_graph);
		}
	}
	if (y != height - 1) {
		if (*(grid_graph[y+1] + x) == '3') {
			if (strchr(grid_graph[0], '0') == NULL) {
				*(grid_graph[y] + x) = old_val;
				return 1;
			}
		}
		if (*(grid_graph[y+1] + x) == '0') {
			count += readyGo(x, y + 1, grid_graph);
		}
	}
	if (x != 0) {
		if (*(grid_graph[y] + x - 1) == '3') {
			if (strchr(grid_graph[0], '0') == NULL) {
				*(grid_graph[y] + x) = old_val;
				return 1;
			}
		}
		if (*(grid_graph[y] + x - 1) == '0') {
			count += readyGo(x - 1, y, grid_graph);
		}
	}
	if (x != width - 1) {
		if (*(grid_graph[y] + x + 1) == '3') {
			if (strchr(grid_graph[0], '0') == NULL) {
				*(grid_graph[y] + x) = old_val;
				return 1;
			}
		}
		if (*(grid_graph[y] + x + 1) == '0') {
			count += readyGo(x + 1, y, grid_graph);
		}
	}
	*(grid_graph[y] + x) = old_val;
	return count;
}

**cyberfish** · 11-04-2010

What compiler are you using?

Did you turn on optimizations?

**icoigo** · 11-04-2010

I use gcc on my iMac, both program(Java and C) are run in terminal. OS is Mac OS X 10.6.4
I don't think I turn on the optimizations option, can you tell me how?

**claudiu** · 11-04-2010

Just look at the help for gcc. You need to use the -O flags.

**cyberfish** · 11-04-2010

gcc -O3 yourcode.c -o yourcode

**icoigo** · 11-04-2010

Cool, after compiling with -O3 option, the C program only takes 1990 milli-seconds to get the result.

**cyberfish** · 11-04-2010

Sounds about right.

In your case C won't be much (any) faster because it's small, numerically intensive code.

The Java dynamic recompiler will recompile it into machine code on runtime anyways.

The JVM also has the advantage of knowing runtime behaviour of your code (which branches are taken more often, etc), allowing it to optimize based on that knowledge.

The C equivalent is profiling then compiling again with the profile (in gcc that's -fprofile-generate and -fprofile-use). How much that helps is highly dependent on the code. I have seen about 5% at most. Sometimes 0. Most of the time somewhere in-between. Not really worth it unless you really want/need the absolute last drop of performance.

**icoigo** · 11-04-2010

Originally Posted by cyberfish

Sounds about right.

In your case C won't be much (any) faster because it's small, numerically intensive code.

The Java dynamic recompiler will recompile it into machine code on runtime anyways.

The JVM also has the advantage of knowing runtime behaviour of your code (which branches are taken more often, etc), allowing it to optimize based on that knowledge.

The C equivalent is profiling then compiling again with the profile (in gcc that's -fprofile-generate and -fprofile-use). How much that helps is highly dependent on the code. I have seen about 5% at most. Sometimes 0. Most of the time somewhere in-between. Not really worth it unless you really want/need the absolute last drop of performance.

What do you mean by "not really worth it"? Those options has any drawback?

**cyberfish** · 11-04-2010

They cost you time. You need to profile it every time you change the code, no matter how minor.

The usual way to do it is to write a benchmark mode for your program, that will do more or less what your program will do in real life (on some typical test data, for example).

Then, in your build script, you can compile the first time with -fprofile-generate, run the automated benchmark, then compile a second time with -fprofile-use.

Usually not worth the hassle. And depending on your benchmark, it can make compilation take much longer. For a program that takes 5 minutes to compile, it will now take more than 10 minutes.

**laserlight** · 11-04-2010

Originally Posted by icoigo

What do you mean by "not really worth it"? Those options has any drawback?

You spend more time working on your program for fewer improvements, i.e., the law of diminishing returns on optimisation sets in.

**new_ink2001** · 11-04-2010

Icoigo,

When I was in algorithm analysis, a very wise professor told us that it is never appropriate to use a recursive implementation. The justification for the statement was elaborated in (computer) architecture class (or the design of Java's virtual machine for that matter). Using some incredibly sized call stack, which included a lot of superfluous data and was time consuming to create was ridiculous, in summary.

Although there are some beautiful definitions for algorithms written recursively, it is necessary for a responsible software engineer to remove recursion from the implementation. I recall from one course, it may have been the same analysis course actually (otherwise it was during a master's program...), that there is always a mapping from a recursive algorithm to a non-recursive equivalent. The various types (head, tail, etc.) lead to clearly defined non-recursive mappings.

So rather than building a huge call stack and bothering the memory controller (which is a little bit slow) for a new stack frame (as a function of the n inputs), in addition to any functional data, at every loop, the engineer could anticipate, acquire and even used some intrinsic information about the nature of the data (as laserlight pointed out three posts ago) before entering the next loop (i.e. recursive equivalent)....

Perhaps the Java compiler engineers were aware of those mappings and "re-wrote" the byte-code to correspond to reasonable non-recursive algorithm? Although it may sound like re-inventing the wheel, building the tread for the terrain is what engineers do right? Gee whiz I really enjoy algorithm analysis.... Let me see if I can find those notes on the mappings. At the moment, I am trying to remember them and it's been a couple years since I looked at the specific material. I was implementing a tournament tree at Walden University the last time I used them...

Best Regards,

New Ink -- Henry

**iMalc** · 11-04-2010

Originally Posted by new_ink2001

When I was in algorithm analysis, a very wise professor told us that it is never appropriate to use a recursive implementation.

No matter how wise you think that statement is, it is simply false.
I never use recursion for something that can be made non-recursive without introducing an explicit stack. Even algorithms that are doubly-recursive, I eliminate at least one of the recursive calls.
However, when the only options remaining are the use of an explicit stack OR a recursive call, there are most definitely appropriate times to use the later. For instance, when it would take more time to code up an explicit stack, for little to no benefit performance an space-wise, or when it would be significantly worse maintenance-wise.

In other words, never say "never".

**Adak** · 11-05-2010

Quicksort has slightly better run times using recursion in my tests, versus an iterative implementation with an explicit stack.

MUCH easier to understand the code, as well.

I wonder if the obstacles to better recursion have been lessened, by newer cpu and cache designs?