Squeak SmalltalkJoker Squeak Smalltalk : Language : prevnext Equality State Vs Identity

> You can put a Set inside a DIFFERENT set with no trouble at all. In
> fact, from what you have said, out of Ambrai, Dolphin, VSE, VW, and
> Squeak, Squeak is the *only* one where you *can* usefully put a Set
> inside another Set.

I know what you said. But you are implying that a set is the Set of the
mathematicians. Only then you can put a set into another set. Such a
Set must not be changed, which contradicts to the programmer's day to
day use of Set.

> One can reasonably expect a Smalltalk programmer to know the two
> fundamental rules of hashing:
>
> (1) x = y implies: x hash = y hash
> (2) A mutable object should not be mutated while its hash value is
>     useful, that is, while it is a member of a set or bag or a key in
>     a dictionary.

That may be clear to you, but I bet that many if not most people who
are programming Smalltalk (i.e. including many newcomers) are not
aware of this problem, which is pretty good demonstrated by the fact
that different Smalltalk dialects implement Sets differently (and only
Squeak gets it right in your opinion).

After all one criterion for the quality of a programming language is
how intuitive it is. A Set and even more a Bag is a brown sack for me.
I stuff things in it and from time to time I ask if something specific
is in it. I would be astonished when I order to put something new into
the sack and it vanishes from the cellar of my warehouse and pops up
in the loft.

Arrays and Strings can be created literally and can not change size.
There is a difference to Sets and that's what I wanted to express with
"explicitly designed to change". Arrays and Strings are more alike the
mathematician's Set than the programmer's Set is.

A Squeak Set implements the mathematician's equality for #=. My brown
sack would be better suited with implementing identity for #= and
#hash accordingly and an additional #equals: would serve the
mathematicians. Perhaps the question is, if the average programmer is
more a mathematician or a warehouse worker?

> I am still waiting for an example where there is a set whose elements
> include Sets and things that are not sets, where the non-set elements
> should be compared using #= (NOT #==) but the Sets should be compared
> as if using #==. 

That is a special version of the key question: Are there two different
kinds of classes (in general and/or for implementation of Set), one
using state-equality and the other identity for #= ? After writing the
last posts, it dawned on me, that I may have used the word "same"
wrongly. I am still unsure about its exact meaning.  Here is the
clarification of what I wanted to express with choosing "intuitively"
state- or identity-comparing for #=. Let's consider the sequence:

String Array | Set | Bag DataBase

DataBase is a fictive class with a) a complex structure of state, but
totally contained in the image, or b) a class which is wrapping
something external, which would be very slow to be tested for state-
equality, because the existence of two dataBases with completely equal
state is allowed, so that the identity test can't be used as a
shortcut.

I am going to formulate the cardinal intention of a #= check like
this: I put the object in question into a variable and elsewhere I
want to check if some unknown object IS THE ONE I put in the variable.
I consider the main intention of Set to be reflected by this
formulation.

A Set should be able to hold objects of arbitrary classes. Choosing an
IdentitySet vs an "EqualStateSet" (not implemented in Squeak right
now) is not an option, if Sets are about "IS THE ONE" in it and the
answer to this question has to be implemented differently for
different classes.

Let's start with String. There are String methods which are returning
either a copy of the receiver, or the receiver itself unaltered or the
receiver itself but altered "in place". Very often the intention of
the program is only to get the string's bytes written to a file or the
screen. Programming with Strings is mostly about their state, rarely
their pointer is of interest, seen from the POV of the result. The
question "IS IT THE ONE" is mostly answered yes, if the two objects
have equal states. They may be identical or not, mostly it doesn't
matter. Even more, often I can't be sure about their identity in case
they are state-equal given the Squeak or Smalltalk in general String
protocol.

Now think of DataBase. There should be no methods in its offered
protocol for intended standard operations, which makes a copy of the
whole database, in order to simply change, add or remove something of
it. There should be only one exemplar of a specific database,
duplicates other than for special intentions are nonsense. If I add to
DataBase for thingsOfTypeA something, I expect it to be still DataBase
for thingsOfTypeA.

Now I am creating two DataBases, one is for thingsOfTypeA and the other
shall contain thingsOfTypeB. At the beginning they are empty and not to
distinguish by their state, so the question "IS IT THE ONE" can only be
answered correctly by testing for identity. Even if this were for some
reason not so clear, the implementor would possibly be forced by
performance reasons to test for identity.

My sequence starts with a class which suggests a state comparison for
answering the question "IS IT THE ONE" end ends with a class which
needs identity comparison. There has to be borderline, where the switch
is made. In other Smalltalks it is drawn between Array and Set and in
Squeak between Set and Bag. I think the existence of this borderline in
major Smalltalks, alone shows that your premise can only be the
question and not the answer.

Perhaps it is debatable where the borderline should be drawn. I was
used to see a Set more like a DataBase, an opaque container, you are
more mathematical - you think of it as a pattern, in a transparent
cover, like a String. As I said, both positions have their merits.
Aside from that, compatibility and performance are an important issue,
which favors the version of the other Smalltalks.