[Effective-cpp] Item 1: Uses and Abuses of vector

White Wolf wolof at freemail.hu
Wed Oct 27 18:39:16 EDT 2004


effective-cpp-bounces at accu.org wrote:
>>> OTOH, loop optimization is a special area in the optimizer, and a
>>> good one will go lengths to find invariants, removing them from
>>> loops, on such one we will see identical binary.
>>> Locals are pretty well optimized, especially ones with trivial data
>>> flow -- so I can't see any chance to observe pessimized code.
>> 
>> I can.  If accessing the local is the same cost as accesing end(),
>> your
> code
>> is longer with that very assignment to the local.  And in addition
>> more fragile.
> 
> That would require a completely useless optimizer. As I
> mentioned, access to locals is the easiest optimisation.

I don't get you.  If accessign the local is the same effort as accessing the
end() member of the vector (which is a viable case, if both are simply
stored into a register as they do not change) copying the end() to a local
is more work to be done, even if only at compile time (eliminating it) or
even if only in the debug version (where it is not eliminated).  It is
simply unecessary until it is proven otherwise.  (Again: see Michael
Jackson's rules.)

>>> As I mentioned in my first post on this issue, I write loop headers
>>> a certain way to express the intent, not for speed reasons.
>>> Precalculated end means the the container will be stable, call
>>> end() in the test means it will change in some cases.   My personal
>>> meta-info. :) 
>> 
>> My problem with this approach is that it only works in the
> ideal case.
> The
>> expression "My personal" does not exist in large teams or large
>> organizations.
> 
> At large organisations 'my personal' will be substituded for
> some company policy. Backed up by a review system enforcing
> it.  (Hopefully ;)

Yeah.  Hopefully.  In practice your coding rules cannot be longer than a
short book, and you can only hope that they will be always remembered and
kept: unless they are very simple and straightforward.

> If I were to write one such policy I'd simply shotrcut your
> problem-set: forbidding that for() iteration on a vector that adds or
> removes elements in the body.

Which works as long as people write for loops but breaks when they change
them 3 years later during a debugging night session.  This for loop is
anyways not a good example of the major do-not-complicate rule, since - as
you have pointed out -, it can be broken in many ways even if it is written
without the premature optimization/idiomatic-hint first.

> Such operation shall be rare by nature, and for most using a vector
> is an abuse in its own right (wow, we're on topic ;).   And even if
> we really want the vector, the operation must be carried out with
> extreme care.
> Looking every step, checking how the ins/del
> influences out current cursor.  Even if you just push_back()
> elements, and have enough reserve, it is still unclear
> whether you want the iteration include the newly added
> elements, or stop at the former end.

Yeah, fiddling with vector can be dangerous at times.  That is why I
frequently think that using deque seems to be many times a better choice
than vector, unless you really need your elements to look like an array.

> All that stuff shall be there commented.

I do not agree with commenting every single detail.  The only comment I
would put there would say:

// if you do not understand this, do not touch it!

The reason being that comments describing what the code does tend to either
go out of sync or in other ways become just clutter (e.g.: people get better
at how to use vector).

>>>> But I have seen broken code too
>>>> many times, where end() was pre-evaluated and the vector (later)
>>>> changed inside the loop.
>>> 
>>> Well, that happens when the tail swings the dog.
>> 
>> Wag the dog.  I don't think so.  It happens, when
> unexperienced people
> write
>> C++ code, or touch existing C++ code.  Which means it happens every
>> day. 
> 
> In "in large teams or large organizations"?
> Unexperienced or unaware people will always wreak havoc --
> you can suggest them anything you like.

Except that if code is written on a way to try to avoid *inviting* them to
break things, they will have less chance to do so.

> IMHO books like the
> one we discuss, and this discussion is aimed at those already
> having the basic experience, and look for ideas to improve.

Sure.  And within 3 months I was able to get out 4 promises of the bosses,
that they will actually order the basic books for people.  Absolutely no
books, but I have nice promises.  That is life.  And believe me that for
summer interns they won't buy the book, and also believe me that it will be
me, in 16 hours days, cleaning up the mess aftewards - if the code, any
piece of it, invites you to break it.

> The primary purpose to write some code is to do some
> behavior. Do what is intended.  Speed and other optimisation
> issues are secondary by nature.

Yep.  I think this is quite obvious and I do not see when did I say
otherwise.

> In a professional environment it is even more the case, there
> I wouldn't even think certain shape of code suggest
> optimisation, it shall reflect the intent instead.

Probably.  But in this mailing list, if it is sent in with the word "faster"
next to it, I believe it is good enough cause to me to think that it was
intended as an optimization.  And please also believe me (or believe Herb
for that matter, he has a nice war story about it) that large organization
fall into the trap of premature optimization just the same as an individual.

> So seeing end() set aside in a const variable to me suggests
> 'end() will not change' and not something else.

To you - and now for people reading this list.  But I just know 50
programmers next door, to whom it only suggests either optimization or
nothing at all.  I also know that we could also easily find 50 others in a
day (anywhere where there are enough programmers) who will start blindly
copy-paste-change that loop formula, because they think that is tobe used
all the time.

> Then if I
> see vector getting modified in the loop, it shall trigger an error.

Again: to you.  It is not a well known convention.  It does not mean it is
bad.  But if you ask the readers of this list, what did they think 3 days
ago if they have seen such a pre-fetched end() in a for loop, I am pretty
sure that the overwhelming majority would say: attempted optimization.  So
while your way-to-communicate might be just the best idea, it will not serve
its purpose until everyone knows what it means (understands it the same
way).

[SNIP]
> Suppose the example mentioned -- reserve at beginning of
> function, then add element in the iteration, expecting you
> stay within capacity. Thus just use the iterators. Someone
> may change the code later removing that reserve() thinking it
> just another optimisation step.  Or comes up with some other
> optimistic guess for the item count -- thinking it covers
> most cases, and the rest may go with an extra reallocation.

That why you see there:

// Black magic, do not change until you have grasped its workings!

:-)

> For such cases you better annonce the parts necessary to work
> in ensemble.
> And in the process I see little chance to introduce the bug
> with the loop end condition.

I would not do so.  I only warn and force people to think when a quick
glance (what most people do) is not enough.  If I start explaining the code
in comments, those comments will end up being out of sync, just like 88% of
the comments I have seen in my life.  Only 88%, because copyright notices in
comments do not expire. :-)

>>> Unless you expect the code to be literary -- and suggest the intent.
>>> In that case set-aside end mean stable vector, and not a means to
>>> gain some cycles.
>> 
>> That may be true for you, but I do not think it is a convention yet.
> 
> Conventons are tied to the environment. :)  Certainly if the
> codebase you work on is not shaped this way you can not use
> that convention -- even if it wolud be a worldwide accepted
> and used idiom.

If it would be a world-wide accepted and understood convention (I would not
use the word idiom here, since programming idioms usually do something and
this one "only" communicates to the reader), so if it would be a world-wide
accepted and understood convention then starting using it in my codebase at
any given time would be feasible.  While if it is not, it just "steepens"
the learning curve of every new programmer added to the project.  OTOH
(unless it is proven to cause cancer in the virtuasl destructors) I would
not mind your notation/convetnion/idiomatic-hint to be worldwide-accepted.

[SNIP]
>>> If in your environment people tend to screw up, will that be better
>>> to know in the rest of cases they gain speed?
>> 
>> If doing so gains speed than it is worth to teach people when to do
>> so and when not.  So that they will not screw up.  If it does not or
>> rarely does, then it is much-much easier to just teach the idiomatic
>> form with no pre-fetched end().
> 
> That covers only half of the issue.

Yeah.  But is is covered by their C++ course *outside* my place and not from
the same budget my salary comes from.  To be very jerk-like. :-)  What I
mean is that one should be really careful while introudcing non-conventional
elements into the code to ensure that their benefits overweight their costs.
Most people in a "normal" large C++ project are *not* C++ programmers.  They
are domain experts.  Which means that they need to keep up with their own
expert domains (for example protocols being standardized as we sepak, every
day, interworking issues etc).  Every larger project will have a framework
of some kind, also introducing a (usually large) set of rules.  The
environment, the platform, the required portability etc. will also introduce
its own many little rules.  Which all *must* be known and kept.  So if I can
drop a rule (ands its burden/stress of learning, keeping, checking) without
making any noticeable tradeoff on code quality, I will gladely do so.  Some
people will even willing to change the language/library to make this happen,
to make incidental programmers' life easier.

>>> Or if it turns out equal, will sticking to direct
>>> end() calls remove *all* the related problems?
>> 
>> I cannot see how that question is relevant.
> 
> It is tied to the complex problem of 'iterating a vector and
> reshaping it en-route'. Messing up the end-condiotion is only
> one of the dangers.  If people think it is the only one, they
> will fall to pray to the other issues.
> (some may get pretty subtle, depending on random actual
> capacity of the vector).

Sure.  But believe me that if my doctor would say: I can get rid of the pain
in your fingers and wrists but not in the knees - I would go for it.
Especially if it all means to stop doing something. :-)

> IMHO if you cover this whole prolem, messing up the 'cacheing
> end' issue just goes away by itself.  Then there's no point
> to force either to cache it or not.

I can understand and accept that this is your point of view.

I guess we have managed to cover that tiny bit of detail very thoroughly.
:-)  Let's give some space to the new item now.

Thanks for the discussion!

Attila aka WW





More information about the Effective-cpp mailing list