A pointer problem from a high school programming contest

**brewbuck** · 04-25-2007

Originally Posted by robatino

Here's a long and detailed thread on the subject:

http://groups.google.com/group/comp....18f6cccd56427d

Here's the relevant passage from that discussion:

The Rationale says the Committee considered defining
the effects at both ends (bilateral dispensations?), but
rejected it for efficiency reasons. Consider an array of
large elements -- structs of 32KB size, say. A system
that actually performed hardware checking of pointer values
could accommodate the one-past-the-end rule by allocating
just one extra byte after the end of the array, a byte that
the special pointer value could point at without setting off
the hardware's alarms. But one-before-the-beginning would
require an extra 32KB, just to hold data that could never
be used ...

I can understand the benefit of an architecture with fine-grained access control to memory. Yes, I absolutely can. But this is still ridiculous. What harm is there in POSSESSING a pointer which points out of bounds? If the architecture really is capable of fine-grained access control, then attempting to use this pointer will result in a fault ANYWAY. This is a completely arbitrary restriction that gives no real benefit whatsoever, while disallowing lots of completely reasonable code.

And get this: the standard specifically ALLOWS an uninitialized pointer. So somehow, you're going to have to tell the bounds-checking hardware "Hey, ignore this one particular pointer -- it's okay according to the C standard." Which of course forces language-specific concerns into the hardware, which is completely idiotic.

**dwks** · 04-25-2007

And get this: the standard specifically ALLOWS an uninitialized pointer. So somehow, you're going to have to tell the bounds-checking hardware "Hey, ignore this one particular pointer -- it's okay according to the C standard." Which of course forces language-specific concerns into the hardware, which is completely idiotic.

It would be relatively easy for a compiler to make uninitialized pointers point to NULL or something if the hardware didn't allow uninitialized pointers, but it would be difficult to do the same sort of thing for one-before-the-start pointers. Just a thought.

**Salem** · 04-25-2007

Initialisation requires no prior knowledge of the old (undefined) contents.

So saying
p = NULL;
neither loads p into a machine register, nor attempts to dereference it.

p--;
on the other hand has the potential for all sorts of problems (of the undefined variety), if p steps off the beginning of an array, or in this case no longer points at a valid object.

What if said decrement resulted in address underflow, and an immediate trap without even trying to dereference it? The "it hasn't happened to me yet" defence isn't good enough.

You could argue that in this specific program that there won't be a problem, but it still remains undefined (remember, undefined includes producing the correct answer). But one specific example isn't good enough to extrapolate every single instance, in every single program, compiled with every single compiler for any real or abstract machine.

Simply having an uninitialised pointer (or anything else for that matter) in scope isn't undefined behaviour. What you do next however is, if that thing isn't initialisation.
It would be just as dumb to assume that some floating point operation using an uninitialised float would be just as harmless.

**brewbuck** · 04-25-2007

Originally Posted by Salem

What if said decrement resulted in address underflow, and an immediate trap without even trying to dereference it? The "it hasn't happened to me yet" defence isn't good enough.

I know of no architecture that would do that. So I admit that the code I write is actually a more restrictive version of C/C++ than the standard would indicate -- the code wouldn't work on such an architecture. But I really don't care. It's no different than writing programs that call getch(). Obviously, such programs are restricted to platforms where getch() is available. Whereas my code is restricted to platforms that don't fault when an invalid pointer value is loaded into a variable.

As I've stated in the past, portability is a spectrum. I understand that some of the code I write is not strictly standard C. But the reality is, it works everywhere. And I'm not talking "I tested it on Windows and Linux." A platform where it didn't work is a platform I would avoid. A system where simply loading an invalid pointer value into a register causes a fault is just plain dumb design.

It would be just as dumb to assume that some floating point operation using an uninitialised float would be just as harmless.

If you look at the newsgroup thread that spawned this discussion, there are others who agree with me. This particular facet of the standard is just plain stupid and largely irrelevant. Don't assume that because I choose to ignore this particular thing that I have no clue what portability means.

**robatino** · 04-25-2007

I understand the need to sometimes compromise portability for functionality or efficiency. But can you give an example of a situation where allowing invalid pointer assignments is necessary for this? Loops can normally be trivially modified to avoid them with no loss of efficiency.

Edit: A typical example is a backwards loop. Most people would write it like

Code:

for (p=c.end()-1; p >= c.begin(); --p) {
  // code
}

but this generates an invalid iterator at the last iteration. In fact, if the container has zero length, even the initial assignment is invalid. But it can be rewritten as

Code:

for (p=c.end(); p != c.begin();) {
  --p;
  // code
}

The trick is that the decrement has to be at the beginning of the body, not the end, so --p has to be separate from the rest of the for statement. But it's just as efficient as the other version.

Edit: Actually, I think only random-access iterators can be compared with >=, and only bidirectional or random-access iterators can be decremented. But the point holds when using arrays and ordinary pointers.

**verbity** · 04-25-2007

It's nice to see that my ignorance can bring all of you into a constructive argument about all this!!!!!

In the end this is what I did:

Code:

void Case::print()
{
	char* templwr = new char[m_length + 1];
	strcpy(templwr, m_buf);

	m_lower = templwr;

	cout << m_name << " \""<< m_buf << "\"  String Length: " << getLength() << endl;
	csis << m_name << " \""<< m_buf << "\"  String Length: " << getLength() << endl;
	
	cout << "Lower Case: " << strlwr(m_lower) <<  endl;
	csis << "Lower Case: " << strlwr(m_lower) <<  endl;

	char* tempupr = new char[m_length + 1];
	strcpy(tempupr, m_buf);
	m_upper = tempupr;

	cout << "Upper Case: "<< strupr(m_upper)<< endl;

	csis << "Upper Case: "<< strupr(m_upper)<< endl;


	delete[]templwr;
	delete[]tempupr;

}

And the variables were protected because I needed to access them in this class which is a derived class of String.....

**brewbuck** · 04-25-2007

Originally Posted by robatino

I understand the need to sometimes compromise portability for functionality or efficiency. But can you give an example of a situation where allowing invalid pointer assignments is necessary for this? Loops can normally be trivially modified to avoid them with no loss of efficiency.

Here is the basic idiom:

Code:

char *start, *end;
start = array - 1; /* Oh no, the HORROR! */
end = array + length;
while(++start < end)
{
    /* Do something */
}

This is called "backward biasing," at least where I come from. Why do it this way? Because the value of "start" is the same, both inside the condition and inside the loop. Why does it matter? Reduction of confusion. Try writing a few hundred thousand lines of code and it will become clear. When this same pattern occurs hundreds or thousands (literally) of times in a code base, you aren't going to go through and change everything because of some "hypothetical" architecture where it could be problematic.

There is some funny guy on this board who likes to claim that "walking a pointer is more efficient than indexing an array." This did, in fact, used to be true. And so you see the above pattern in legacy code. All... over... the... place.

It's not like I'm referencing an undefined pointer. It's a syntactic convenience for crap's sake. I'm not giving it up because the standard says so.

**quzah** · 04-25-2007

Code:

for( start = array,  end = array + length; start < end; start++ )

To each his own I suppose.

Quzah.

**brewbuck** · 04-25-2007

Originally Posted by quzah

Code:

for( start = array,  end = array + length; start < end; start++ )

To each his own I suppose.

Quzah.

I agree. But I didn't write every line of code I'm responsible for. The fact is, this pattern is everywhere. New code I write usually just uses a for-loop over the index. But I sure as heck am not wasting two years "fixing" all the instances of this.

**laserlight** · 04-25-2007

There is some funny guy on this board who likes to claim that "walking a pointer is more efficient than indexing an array." This did, in fact, used to be true. And so you see the above pattern in legacy code. All... over... the... place.

Actually, I think that discussion was deleted when we had the recent rollback of the forums, heheh.

The fact is, this pattern is everywhere.

I had the impression that the for loop version was clearer and more obvious, even excluding the begin/end idiom in the standard library. Then again, maybe my youth (i.e., less exposure to weird "legacy" stuff) has given me a bias to see the loop as clearer and more obvious.

**quzah** · 04-25-2007

Code:

#define while(x) for( ; x ; )

That'll get you started.

I've done all the hard work, I'll leave the rest up do you!

Quzah.

**robatino** · 04-25-2007

I've never seen backward biasing - to me it looks far more confusing than quzah's version, which is basically how I've always seen forward loops done, and which doesn't generate invalid pointers. And if the increment is inside a for loop, instead of a while loop, it gets done at the end of the body instead of the beginning, which means that start has to be initialized to array instead of array-1.

Edit: Then again, I've had the luxury of not having to maintain legacy code. But it's one thing to not want to change thousands of lines of other people's legacy code, and another to write such code yourself.

**brewbuck** · 04-25-2007

Originally Posted by robatino

I've never seen backward biasing - to me it looks far more confusing than quzah's version, which is basically how I've always seen forward loops done, and which doesn't generate invalid pointers.

It wasn't invented for clarity's sake

I don't claim to have invented it, but I deal with code that uses it on a daily basis. I am not keen on fixing what isn't broke. To me it reads naturally, but as Quzah said, to each his own.

Also, consider that a lot of legacy C code was "hand translated" from other languages during the 80's and early 90's. So a lot of this stuff is a result of directly translating idioms from one language to another, which ends up looking funny. But this particular pattern isn't rare by any means.

**691175002** · 04-25-2007

Well I am quite new to the entire pointer C++ thing, so this seemed like good practice, can someone tell me if I am following through the program right?

Code:

void g (int i) { //Passes a copy of the inputted number?
   int*p; //Creates a pointer
   p=&i; //p Points to the address of i (which is a copy of the original number?);
   *p--; //Points to a non-existing address;
}

void h (int*p) { //Takes a reference to the inputted varible
   *p--; //*Doesnt* deincrement variable since -- is resolved before *
}

void i (int*p) { //Takes a reference to the inputted varible
   int i; //Creates a local varible i
   i=*p; //Assigns i the value of the inputted varible (why was it assigned to p in the first place?)
   i--; //decrements the local copy
}

int main() {
   int index=100;
   for(;index>0;index--) {  //Standard for loop.
      printf("*"); 
      f(index);
      g(index);
      h(&index);
      i(&index); 
   }
   //Prints * 100 times since none of the functions do anything
//Talk about a dumb program
}

**laserlight** · 04-25-2007

which is basically how I've always seen forward loops done, and which doesn't generate invalid pointers

What about reverse loops? I think that a naive implementation of reverse iterators for certain standard containers could end up doing something like:

Code:

for (rbegin = begin + length - 1, rend = begin - 1; rbegin != rend; --rbegin)

Prints * 100 times since none of the functions do anything

That's correct. None of the four functions (main() excluded) has any net effect, assuming that this invalid pointer thing also has no net effect.

Thread: A pointer problem from a high school programming contest

Thread Tools

Search Thread

Display

Similar Threads

pointer to pointer realloc problem

Double Pointer to a data type that is already a pointer!

A problem with pointer initialization

Problem with function's pointer!

towers of hanoi problem