[Effective-cpp] Item 1: Uses and Abuses of vector

Wed Oct 27 22:18:19 EDT 2004

> >> I can.  If accessing the local is the same cost as accesing end(),
> >> your  code
> >> is longer with that very assignment to the local.  And in addition
> >> more fragile.
> >
> > That would require a completely useless optimizer. As I
> > mentioned, access to locals is the easiest optimisation.
>
> I don't get you.  If accessign the local is the same effort as accessing
the
> end() member of the vector (which is a viable case, if both are simply
> stored into a register as they do not change)

That is one slicky point. Data flow of the local is all there, clear,
unmissable if you never get its address passed to the outside world.
Prove that a member of the vector does not change requires *all* functions
used on that vector to be inlined. Even then a code generator may chose to
play conservative, and drop the flow analysis, as soon as the vector's
address is passed to any function.  Having pointers and refs around is
generally an inhibitor. As class members are always accessed through a this,
it is easy for them to get 'lost'.

> > If I were to write one such policy I'd simply shotrcut your
> > problem-set: forbidding that for() iteration on a vector that adds or
> > removes elements in the body.
>
> Which works as long as people write for loops but breaks when they change
> them 3 years later during a debugging night session.

There's no known way to sidestep Murphy's Law.

> > Such operation shall be rare by nature, and for most using a vector
> > is an abuse in its own right (wow, we're on topic ;).   And even if
> > we really want the vector, the operation must be carried out with
> > extreme care.
> > Looking every step, checking how the ins/del
> > influences out current cursor.  Even if you just push_back()
> > elements, and have enough reserve, it is still unclear
> > whether you want the iteration include the newly added
> > elements, or stop at the former end.
>
> Yeah, fiddling with vector can be dangerous at times.  That is why I
> frequently think that using deque seems to be many times a better choice
> than vector, unless you really need your elements to look like an array.

Yeah.  I generally use lists if on-the-fly insertions or deletions are
needed. And if a vector is 'transformed' I copy the contents to a new one,
and swap with the original at the end. That way leaves less chance to break,
and even plays better with lurking exceptions. At some performance cost that
shall is considered negligable -- if it weren't vector has to be replaced.

> > All that stuff shall be there commented.
>
> I do not agree with commenting every single detail.  The only comment I
> would put there would say:
>
> // if you do not understand this, do not touch it!

That goes against SD's gotcha #1.    That line brings no information, as it
applies to any code anywhere.  And helps nothing in understanding the code
well.

Also, that 'all stuff' may read misleading.  I don;t mean to go
Balzac-style.  In practice 2-3 sentences can cover all the design-related
decisions, and mark the lines depending on each other.   It's not about
showing off, just to keep all the information there.
When the code is straightforward, it documents itself, but when there are
those hidden relations, they must be added as a comment.

> The reason being that comments describing what the code does tend to
either
> go out of sync or in other ways become just clutter (e.g.: people get
better
> at how to use vector).

I tend to insert what the state is at some key point (only where it is not
obvoius), or a new operation starts, or reference to a step in outside dox.

If I deal with fragile objects like pointers, iterators, I either keep them
valid, or immediately comment they got invalid, or some special state.  Or
after an operation that could invlaidate, but didn't, I put in a positive.
Otherwise it will repeatedly steal time: some problem with the code, and you
spot that location as possible souce. Spending rounds until it turns out not
the real one.

Again that may sound gross, but IRL the ocasions are rare, when functions
are short, and ideas can be and are expressed directly with code.  (like
immediately resetting the pointer removes the need for a comment about its
validity.)

Keeping comments in sync with code -- well sometimes it really gets
discovered only at some next update.

> > In "in large teams or large organizations"?
> > Unexperienced or unaware people will always wreak havoc --
> > you can suggest them anything you like.
>
> Except that if code is written on a way to try to avoid *inviting* them to
> break things, they will have less chance to do so.

Dunno. I lost my sense of how beginners think, and just play by ear, when a
problem is discoveretd try to go after it. Thinking ahead is much like the
premature optimisation that often turns out to be pessimisation. :)

> > IMHO books like the
> > one we discuss, and this discussion is aimed at those already
> > having the basic experience, and look for ideas to improve.
>
> Sure.  And within 3 months I was able to get out 4 promises of the bosses,
> that they will actually order the basic books for people.  Absolutely no
> books, but I have nice promises.  That is life.  And believe me that for
> summer interns they won't buy the book, and also believe me that it will
be
> me, in 16 hours days, cleaning up the mess aftewards - if the code, any
> piece of it, invites you to break it.

I found having the book is less a problem -- make people read it is the real
one. The first just cost a couple of bucks, the other, well, that beats me.

> > The primary purpose to write some code is to do some
> > behavior. Do what is intended.  Speed and other optimisation
> > issues are secondary by nature.
>
> Yep.  I think this is quite obvious and I do not see when did I say
> otherwise.

You didn't. Though the whole thread came up from the point whether the
speed-gain of fiddling with end() is positive, and what is its measure.

IMHO exactly that is a question that shall be kept to profiling time -- and
before that shape the code using other points.

> > In a professional environment it is even more the case, there
> > I wouldn't even think certain shape of code suggest
> > optimisation, it shall reflect the intent instead.
>
> Probably.  But in this mailing list, if it is sent in with the word
"faster"
> next to it, I believe it is good enough cause to me to think that it was
> intended as an optimization.  And please also believe me (or believe Herb
> for that matter, he has a nice war story about it) that large organization
> fall into the trap of premature optimization just the same as an
individual.

LOL. When we talk about coding the dram world and the real world is easily
tangled -- some sentences describe what we see otherd what we
want/expect/woud like to see. Certainly I believe all kind of s*** happens
everywhere, and just being large does not prevent it.  MS is not exactly a
small company still fighting the most stupid bugs like buffer overruns.

The quoted stuff more relates to the dream environment, where code is
checked in after some review, and at least the most basic priciples are
enforced.

Certainly looking at some random code written by no-one knows who 10 years
ago, I would not expect anything.  Not until I grok the style and principles
behind that code, if they exist.

> > So seeing end() set aside in a const variable to me suggests
> > 'end() will not change' and not something else.
>
> To you - and now for people reading this list.  But I just know 50
> programmers next door, to whom it only suggests either optimization or
> nothing at all.  I also know that we could also easily find 50 others in a
> day (anywhere where there are enough programmers) who will start blindly
> copy-paste-change that loop formula, because they think that is tobe used
> all the time.

Somewhere I mentioned style, idioms, etc are dependent on the environment.

Unfortunately it is really easier to find blind copypasters than people
actually knowing all the whys.

> [SNIP]
> > Suppose the example mentioned -- reserve at beginning of
> > function, then add element in the iteration, expecting you
> > stay within capacity. Thus just use the iterators. Someone
> > may change the code later removing that reserve() thinking it
> > just another optimisation step.  Or comes up with some other
> > optimistic guess for the item count -- thinking it covers
> > most cases, and the rest may go with an extra reallocation.
>
> That why you see there:
>
> // Black magic, do not change until you have grasped its workings!
>
> :-)

Then comes your ' debugging night session 3 years later '.

I'd put a few-words tail comments:

vec.reserve(oldsize + 3); // line ** below will break unless having space
for 3 extra items
... for(...)
{...
    vec.push_back(newitem); // ** iterators still valid as we reserved the
extra space
}

Call it my kind of magic to "not call any idiots to easily break the code".
;-)

And if they do at least I can flame them at will.

> > For such cases you better annonce the parts necessary to work
> > in ensemble.
> > And in the process I see little chance to introduce the bug
> > with the loop end condition.
>
> I would not do so.  I only warn and force people to think when a quick
> glance (what most people do) is not enough.

I recall a night debug session, staring clueless why the code doesn't pass
some pont where I even inserted a messagebox popup.  Later I discovered my
wery own inserted banner telling 'this code is called from working thread,
do not even think use GUI especially messageboxes.'  It was as many as 3
lines above the point I inserted the messagebox.

I'm little skeptic people will reliably spot comments no near enough.

> [SNIP]

> I guess we have managed to cover that tiny bit of detail very thoroughly.
> :-)  Let's give some space to the new item now.

Yeah. :)