[Effective-cpp] Item 1: Uses and Abuses of vector

Wed Oct 27 08:31:12 EDT 2004

From: "Herb Sutter" <hsutter at microsoft.com>

> >It may be low, but that still breaks the 'not pay for what you not use'
> >rule.
>
> Why is that? You don't have to use the container.

That is the best argument. ;-)    I recall the FAQ: why no auto_ptr_array in
the standard, and the answer is: do not use arrays, auto or anything, use
vector instead.  What make sense.   Until you arrach a mandatory bounds
check to op[].

As Bruce Schneier keeps preaching, security is a tradeoff, and careful
analysis is necessary for any proposal -- as if you introduce something
indeed more secure, but inconvenient for some users, they may sidestep the
new measure, and doing that pick an even less secure something.

(I don't claim here it will surely happen, just that the possibility is
there and shall be taken in the analysis.)

> >But the real problem is not the cost of the check, but the action if
> the
> >range problem is discovered.
> >
> >Leaving it undefined gives way to do anything *good* for a quality
> >implementation,
>
> I assume you mean requiring range checking but leaving undefined what
> happens if a range check is violated? That is essentially synonymous to
> the status quo, because it means nothing to require a range check
> without specifying that the behavior is something other than undefined.

No, I just described the current situation, where most SL implementations do
(IMHO) sane and useful things.  Instead of formatting the HD or launching a
nuke.  For the debug builds completely, for the release builds leaving the
point of debate -- as a frequently picked option is use the builtin op[] on
the allocated block -- with UB of that operation.

> Undefined behavior is the worst of all possible behaviors. Anything,
> even unspecified or implementation-defined behavior, is better than
> that.

In short, I agree.

> >like asserting in debug, and no check in release build, or
> >terminate() in release build, or make it tunable by switch, then still
> >remain conforming.
> >
> >Defining the behavior would take that freedom away.   And force
> something
> >possibly good for one group and bad for others.
>
> How is this different from when the standard library throws an exception
> when it encounters an error? You could argue that it should allow the
> freedom for operator new to assert instead, or some such, right?

Come on, Herb :) you know how it is different on the degree we shall never
mention them together, really.

Memory is a resource. You don't know how much or little you have. bad_alloc
is a clearly runtime case that can hit you anytime influenced by conditions
outside your control.  It is also in practice a rare and exceptional
condition.
Any correct program that allocates memory can encounter a failure, and
failure doesn't mean the program is incorrect.
Catching the exception is safe, though many programs will not have any
backup strategy for the case, just exit.  Or report failure with that
operation and go on with the next.

Using arrays and [] is completely different.  *Most* programs are designed
to set up the arrays, then use the elements _within_ that array. An outside
access means a BUG in the program -- in the design or the implementation.
And the bug is *not* in the op[] operation, but happened somewhere upstream,
here its existance is merely discovered.
Recovering in a knowingly buggy program, at an unknown program state is all
the way different from recovering a resource shortage.

There may be other usages of vector, a weak ago I asked here for cases of
that, people here seems to agree that it is a rare special case of design
just looking at random location in the vector with at() and catch the
exception.    For the rest of cases access is expectes to be within the
bonds, and if not that indeed indicates a bug.

> For range checking in particular, the easy objective measure is whether
> the check turns undefined behavior into another kind of behavior
> (preferably well-defined behavior). The check is "snake oil" iff it
> leaves the behavior undefined.

It is also snake oil, if the behavior is defined, just doesn't make sense
for the situation. Especially if it also leaves high chance to trigger UB in
the program later.  For example the prescribed behavior is throw -- but the
programmer, not expecting a bounds error, does not catch it. So the net
outcome is a possible stack unwind in the broken program state.
Or do catch it somewhere up, but all the unwinding happens in the broken
state.

> Beyond that, IMO unspecified or
> implementation-defined behavior is only moderately better, and for my
> liking the check is not very useful unless the behavior is well-defined.

That leads to another phylosophy issue about UB.  A deal of my current
real-life headaches relate to UB, that is not thoroughly specified by my
implementation, though I 'know' what will happen from other sources.  (for
stuff practically unavoidable/necessary to use,  like reinterpret_cast
results cross pointers or integrals, overvlow of signed integrals, etc.)
Also I face incomplete documentation of the libraries I use.

Hm, let's make it more concrete, most time I use MSVC 5 and 6 with MFC on
Win32. I cover the holes in the documentation by looking at the  sources,
sometimes at assembly.  What makes it possible to create working programs
with that toolset, but leaves me uneasy.  (Also makes changing the
environment dangerous). That all would not be a problem, if the implementer
took a few more hours to write some more pages of documentation.    I hope
the new versions of the same compiler are better in that regard, and if you
have any influence I beg you to check out, and push the issue. :)

Back to the main point, I find the implementation-defined behavior WAYS
better than UB, and a thing you can build a house upon.  Yeah, it may be
nonportable -- but that is only issue when it's time to actually port
something.  Even then, if the behavior is *mandatory* to be specified by the
implementer, I just look how it is specified here and there, if identical,
hooray, if not, it is clear what the change is.
Besides that I see no real difference in *where* I find the specification,
what is the title of the tome holding that piece of information.   With C++
it is unavoidable to look the implementation-specifics in the compiler dox
anyway.