[Effective-cpp] Item 22: Avoid in-place key modification in set and multiset
Kevlin Henney
kevlin at curbralan.com
Sun Dec 7 01:46:07 EST 2003
In message <3FD0BB58.74431EC2 at noaa.gov>, Robert Haynes
<Robert.Haynes at noaa.gov> writes
>
>This item begins with Scott explaining that it's best not to directly
>modify items in sets and multisets, because these items are stored
>in sorted order, and modification could break the sortedness
>constraint. This isn't a problem with other associative containers
>because, for a map<K,V> or multimap<K,V>, each element is
>stored as a pair<const Key, Value> - i.e. the keys are declared const
>and are not modifiable (except possibly with const_cast usage). But
>items in set and multiset are not declared const, and are allowed to be
>modified.
>
>Scott provides an example of when direct modification might be
>desirable,
>by creating sets of Employee objects which have, among others, title and
>EmployeeID data members. The objects are then stored in sorted order
>based on the EmployeeID, and code is given to directly change an
>employee's title. The example seems simple enough, but provides a
>reason for why objects aren't declared const. Scott just warns heavily,
>though, that changing the key part of the object could corrupt the
>container and lead to undefined results.
Interestingly, I have always tended to think that the desire to modify
an object contained in a set is often the consequence of the wrong
design, and trying to force that design further out of shape by fixing
symptoms rather than causes. If you want to put values in a set and then
wish to change parts of the value, your design generally needs fixing,
not the standard library :->
Let us just say that I store my in-memory employee database in a set. As
the concept of an employee is quite clearly an entity, it follows that I
would not wish to hold or treat employee objects as values. They should
typically be allocated on the heap, and should certainly not support
copy construction or assignment. This means that I cannot declare a set
of employees directly, nor would I want to.
OK, so the design now holds a set of pointers to employees. Associative
containers are free to use pointers as keys -- the ordering in the
container will be arbitrary but total. In one way, this unasks the
question of modifications that break the key, because it is no longer
either desirable or possible (barring wilful acts of cast terrorism).
However, it is unlikely that keying on pointer value was quite what we
had in mind: sorting by employee ID is a useful property, especially if
we want to look up by ID as well as ensure that IDs are unique.
The route to such customisation is through the normally defaulted
comparison template parameter. Define a suitable function object type
that takes two employee pointers and compares them by dereferencing to
their IDs. Problem solved... or problem reintroduced?
There is no requirement that the set hold pointers to const employee
objects so in principle the key could change outside the set, and once
more you have a hosed set instance. Well, in principle this would
suggest that the employee design is broken: it is reasonable to expect a
query function to access the ID, but to provide a modifier is not
reasonable. The ID should be effectively immutable once created, so this
once more unasks the problem. Providing setters as a matter of habit is
often a problem in many systems that I see: the solution is generally to
remove them :->
OK, so now we have a set of pointers to employee objects whose ordering
is determined by a function object that examines immutable employee IDs.
The original problem has, in effect, been solved. But there is still a
problem with the design that I have always had trouble with: how do I
search for employees by ID? If this isn't actually a requirement, the
problem has been solved by dissolution. If it is, we have a problem. In
fact, even if we didn't use the entity-based solution, we still have
this problem, so it is not a new one -- just one that I chose to defer
for a few paragraphs :->
You can use find, count, lower_bound, upper_bound or equal_range to
interrogate the existence or position of an entry in an associative
container. You use the key, which in the case of the set is the whole
object. So, the only way you can find an employee object is by providing
an employee object! This is not at all what we want, and to pursue this
line of thinking only leads to a twisted design where we must provide
dummy employee objects whose only purpose is to hold an employee ID and
then be discarded once the real employee of that ID has been found (or
not).
If you want a key to find an object, then use a data structure that is
organised in terms of using a key to find an object: a map. A set is not
specifically designed to fulfil this role whereas a map specifically and
explicitly is. In other words, part of the problem is that, from the
outset, we have been using essentially the wrong data structure to hold
our employees!
From the design that we have, the simplest approach is simply to
parameterise the map on the ID type, an employee pointer and optionally
a function object for the ID type -- I say optionally because the ID
type might already be naturally and reasonably ordered wrt operator<. An
employee is then inserted by passing both its ID and the pointer to it,
from which we obtained the ID. This means that there is some redundancy
involved, ie a duplication of the data involved. OTOH, such a slight
denormalisation is less of a problem when the data involved is
immutable: it is denormalised mutable data that is the real trouble
maker. However, weighed against these concerns we must concede that the
design is significantly simpler when it comes to use and preserving the
integrity of the data.
If the direct duplication is likely to cause us anxiety and sleepless
nights -- as well as use up more resources than necessary (eg if the ID
type is a very big string indeed) -- then rather than provide a copy of
the ID as the key a pointer can be passed. If the ID type is a VBSI I
would presume that the employee would return it via a level of
indirection anyway, either by reference or as a const char *. If the ID
type does not fit this model of usage, you can feasibly use a pointer to
the employee object as the key as well as the mapped type, and provide
the original function that performed this ordering. I am less keen on
this variation because to some extent it represents even greater
denormalisation than just copying or pointing to the key, and it
reintroduces the dummy-employee-for-searching problem. However, solving
the problems of large arbitrary-sized keys is not something that keeps
me awake at night because such keys are often the problem to be solved
:->
Alternatively, we can question whether or not the ID is actually a part
of the employee object at all. If I am likely to be dealing in
free-standing employee objects where I need to get the employee ID, then
yes having the ID as part of the employee object may make sense.
However, if my most common model of usage is to look up employees by ID
or to walk through the employee collection then I know or have access to
the key already and don't need it to be duplicated in the employee
object for reverse lookup. In terms of reverse lookup, because both the
employee pointer and the ID are unique, you could chose to use a two-way
mapping, either in a custom container object built for such a purpose or
co-ordinated and maintained between two different tables in the manager
object that looks after the employees.
In summary, it seems that, if not all then certainly most, attempts to
highlight some problem with set and element mutability are based on poor
models of the problem rather than on any issue with sets, which should
always quite reasonably treat and expose their elements as immutable.
The working detailed above shows why, for the employee problem and
others like it, the item has not successfully identified a problem, let
alone established its scope. The fact that the item focuses on the
linguistic rather than the semantic seems to further fan the perception
of a problem rather than quench it.
Kevlin
--
____________________________________________________________
Kevlin Henney phone: +44 117 942 2990
mailto:kevlin at curbralan.com mobile: +44 7801 073 508
http://www.curbralan.com fax: +44 870 052 2289
Curbralan: Consultancy + Training + Development + Review
____________________________________________________________
More information about the Effective-cpp
mailing list