[Effective-cpp] Item 22: Avoid in-place key modification in set and multiset

Sun Dec 7 01:46:07 EST 2003

In message <3FD0BB58.74431EC2 at noaa.gov>, Robert Haynes 
<Robert.Haynes at noaa.gov> writes
>
>This item begins with Scott explaining that it's best not to directly
>modify items in sets and multisets, because these items are stored
>in sorted order, and modification could break the sortedness
>constraint.  This isn't a problem with other associative containers
>because, for a map<K,V> or multimap<K,V>, each element is
>stored as a pair<const Key, Value> - i.e. the keys are declared const
>and are not modifiable (except possibly with const_cast usage). But
>items in set and multiset are not declared const, and are allowed to be
>modified.
>
>Scott provides an example of when direct modification might be
>desirable,
>by creating sets of Employee objects which have, among others, title and
>EmployeeID data members. The objects are then stored in sorted order
>based on the EmployeeID, and code is given to directly change an
>employee's title.  The example seems simple enough, but provides a
>reason for why objects aren't declared const.  Scott just warns heavily,
>though, that changing the key part of the object could corrupt the
>container and lead to undefined results.

Interestingly, I have always tended to think that the desire to modify 
an object contained in a set is often the consequence of the wrong 
design, and trying to force that design further out of shape by fixing 
symptoms rather than causes. If you want to put values in a set and then 
wish to change parts of the value, your design generally needs fixing, 
not the standard library :->

Let us just say that I store my in-memory employee database in a set. As 
the concept of an employee is quite clearly an entity, it follows that I 
would not wish to hold or treat employee objects as values. They should 
typically be allocated on the heap, and should certainly not support 
copy construction or assignment. This means that I cannot declare a set 
of employees directly, nor would I want to.

OK, so the design now holds a set of pointers to employees. Associative 
containers are free to use pointers as keys -- the ordering in the 
container will be arbitrary but total. In one way, this unasks the 
question of modifications that break the key, because it is no longer 
either desirable or possible (barring wilful acts of cast terrorism). 
However, it is unlikely that keying on pointer value was quite what we 
had in mind: sorting by employee ID is a useful property, especially if 
we want to look up by ID as well as ensure that IDs are unique.

The route to such customisation is through the normally defaulted 
comparison template parameter. Define a suitable function object type 
that takes two employee pointers and compares them by dereferencing to 
their IDs. Problem solved... or problem reintroduced?

There is no requirement that the set hold pointers to const employee 
objects so in principle the key could change outside the set, and once 
more you have a hosed set instance. Well, in principle this would 
suggest that the employee design is broken: it is reasonable to expect a 
query function to access the ID, but to provide a modifier is not 
reasonable. The ID should be effectively immutable once created, so this 
once more unasks the problem. Providing setters as a matter of habit is 
often a problem in many systems that I see: the solution is generally to 
remove them :->

OK, so now we have a set of pointers to employee objects whose ordering 
is determined by a function object that examines immutable employee IDs. 
The original problem has, in effect, been solved. But there is still a 
problem with the design that I have always had trouble with: how do I 
search for employees by ID? If this isn't actually a requirement, the 
problem has been solved by dissolution. If it is, we have a problem. In 
fact, even if we didn't use the entity-based solution, we still have 
this problem, so it is not a new one -- just one that I chose to defer 
for a few paragraphs :->

You can use find, count, lower_bound, upper_bound or equal_range to 
interrogate the existence or position of an entry in an associative 
container. You use the key, which in the case of the set is the whole 
object. So, the only way you can find an employee object is by providing 
an employee object! This is not at all what we want, and to pursue this 
line of thinking only leads to a twisted design where we must provide 
dummy employee objects whose only purpose is to hold an employee ID and 
then be discarded once the real employee of that ID has been found (or 
not).

If you want a key to find an object, then use a data structure that is 
organised in terms of using a key to find an object: a map. A set is not 
specifically designed to fulfil this role whereas a map specifically and 
explicitly is. In other words, part of the problem is that, from the 
outset, we have been using essentially the wrong data structure to hold 
our employees!

 From the design that we have, the simplest approach is simply to 
parameterise the map on the ID type, an employee pointer and optionally 
a function object for the ID type -- I say optionally because the ID 
type might already be naturally and reasonably ordered wrt operator<. An 
employee is then inserted by passing both its ID and the pointer to it, 
from which we obtained the ID. This means that there is some redundancy 
involved, ie a duplication of the data involved. OTOH, such a slight 
denormalisation is less of a problem when the data involved is 
immutable: it is denormalised mutable data that is the real trouble 
maker. However, weighed against these concerns we must concede that the 
design is significantly simpler when it comes to use and preserving the 
integrity of the data.

If the direct duplication is likely to cause us anxiety and sleepless 
nights -- as well as use up more resources than necessary (eg if the ID 
type is a very big string indeed) -- then rather than provide a copy of 
the ID as the key a pointer can be passed. If the ID type is a VBSI I 
would presume that the employee would return it via a level of 
indirection anyway, either by reference or as a const char *. If the ID 
type does not fit this model of usage, you can feasibly use a pointer to 
the employee object as the key as well as the mapped type, and provide 
the original function that performed this ordering. I am less keen on 
this variation because to some extent it represents even greater 
denormalisation than just copying or pointing to the key, and it 
reintroduces the dummy-employee-for-searching problem. However, solving 
the problems of large arbitrary-sized keys is not something that keeps 
me awake at night because such keys are often the problem to be solved 
:->

Alternatively, we can question whether or not the ID is actually a part 
of the employee object at all. If I am likely to be dealing in 
free-standing employee objects where I need to get the employee ID, then 
yes having the ID as part of the employee object may make sense. 
However, if my most common model of usage is to look up employees by ID 
or to walk through the employee collection then I know or have access to 
the key already and don't need it to be duplicated in the employee 
object for reverse lookup. In terms of reverse lookup, because both the 
employee pointer and the ID are unique, you could chose to use a two-way 
mapping, either in a custom container object built for such a purpose or 
co-ordinated and maintained between two different tables in the manager 
object that looks after the employees.

In summary, it seems that, if not all then certainly most, attempts to 
highlight some problem with set and element mutability are based on poor 
models of the problem rather than on any issue with sets, which should 
always quite reasonably treat and expose their elements as immutable. 
The working detailed above shows why, for the employee problem and 
others like it, the item has not successfully identified a problem, let 
alone established its scope. The fact that the item focuses on the 
linguistic rather than the semantic seems to further fan the perception 
of a problem rather than quench it.

Kevlin
-- 
____________________________________________________________

   Kevlin Henney                   phone:  +44 117 942 2990
   mailto:kevlin at curbralan.com     mobile: +44 7801 073 508
   http://www.curbralan.com        fax:    +44 870 052 2289
   Curbralan: Consultancy + Training + Development + Review
____________________________________________________________