First off, trying to time anything calling printf is a fools errand.
You're trying to measure an ant-fart in a hurricane.
Second, it's basically the same code whichever way you write it.
Rather than trying to rationalise the instruction count, just count them directly.
Code:
#include <stdio.h>
void foo ( int numbers[], int len ) {
for (int i = 0; i < len; i++) {
printf("%d\n", numbers[i]);
}
}
void bar ( int numbers[], int len ) {
for (; len > 0; --len) { printf("%d\n", *numbers++); }
}
Then compile with -S and other flag(s) of your choice.
Code:
$ gcc -S foo.c
$ less foo.s
$ gcc -S -O2 foo.c
$ less foo.s
By the time the optimiser has had a go at them both, the loops are essentially the same.
The code you write should follow what you would naturally write.
In that respect,
for (int i = 0; i < len; i++) has far fewer surprises than
for (; len_cpy > 0; --len_cpy)
If you try to be too clever, you might just end up confusing the optimiser and coming out worse than if you'd just stuck to the plain and simple.
Focus your efforts on choosing the best algorithms and data structures for the problem at hand, and leave the micro-management to the compiler.
No amount of loop reformatting or compiler optimisation flags will save you if you choose 'bubblesort' instead of 'quicksort'.