Squeak SmalltalkJoker Squeak Smalltalk : Philosophy : prevnext Modules Components Projects

Andrew P. Black wrote:

> DeltaModules are a kind of Module[2].  They are distinguished from
> base modules by being defined by difference.  ...
> So, my question is: why do you consider DelatModules as being
> conceptually different from Modules at all?  Why do they have
> different properties, just because they happen to have a different
> representation? ...
> Why do DeltaModules and Module not have identical
> protocols?  Why are DeltaModules and regular Modules not just tow
> implementation classes with the same interface?

You are right: My specification of DMs was unclear as well as unfinished,
but I wanted to put it out early along with the code.

It did mix the implementation aspect with how DMs would be used, in the
manner of "here are some problems, and this is how DMs can address them".
The implementation difference is not strictly linked to the uses I
mentioned, but in practice there will be such a link: the convenience of the
delta representation means that class extensions probably will be put in DMs
99% of the time, and so on. But I know it may have looked like there was a
harder link there. (I hope this answered your policy vs. mechanism point as
well, otherwise let me know.)

You are also right in that the differences between regular and delta modules
should be as small as possible, and that we should be specific about what
they are. However, the two cannot be identical: a DM needs to be defined
relative to a base module, and the concept of a base module is meaningless
for a regular module. This ought to be the only real difference. Then we may
want to have some tool-like support for viewing DMs in terms of differences
wrt the base module, but those are minor issues.

I think that this situation between Modules and DeltaModules is rather
common in class/subclass relations, and I think your question can be
understood by analogy to other known cases: why aren't the protocols for
Array and OrderedCollection identical, and so on. It is because they are
very similar but still slightly different.

The result of activating/installing a DM is equivalent to creating a
new/different version of the base module by filing in all the contents as a
regular Module. (I now think that de/activating a DM is a better term than
un/installing it.) You aren't forced to use DMs for class extensions, but
filing in five new methods is clearly more convenient than loading a new
version of the whole module that contains String.

Activating a DeltaModule does not modify its base module--it creates a new
version of the base module, with the modifications installed. Being able to
modify a module is a bad thing since different code packages should be able
to specify a module by version and be able to rely on its contents. But this
distinction is a technical detail.

ducasse stephane wrote:

> I have mod1 with extensions (DM1) = String>>asUrl
>
> can I see String>>asUrl from mod2 if there are not relations between the
> two?
>
> I have the intuition that I should only see asUrl from a module only if this
> module depends from mod1.
>
> What is your point of view on that?

There is a short answer to this: Yes, that was just how I intended it to be.

More precisely, when the DeltaModule containing your extensions would be
made active, this would create a new, modified version of the base module
with the changes installed into it.  If mod2 wouldn't specify a particular
version of the modified module, then it might use the new version as well,
but if it specifically wanted the old version then it wouldn't see the
changes.

If two different loaded systems want to modify the same module (the same
class/es), then you need to switch between the two module versions created
by the modifications--you enter into the area of conflict resolution.

Andrew P. Black wrote:

> I understand that "DeltaModules can be un/installed, but the
> un/install operation has no meaning for regular Modules" [1].  My
> question is: Why.  If un/install is such a good idea, why not have it
> for all Modules, not just for DeltaModules?

The reason is that Modules are independent and self-contained, and therefore
cannot be in conflict with each other. However, several packages may want to
apply DMs to the same module, and then there is a use for being able to
de/activate all but one of these DMs.

But while you could have two versions of Morphic loaded at the same time,
with no conflicts since they would be two different modules, you can't have
both running at the same time. E.g. who takes care of input events? But this
is not a conflict on a module level (which just defines code), but on a
component level (i.e. between parts of the system that are running).

> I just put my "loose
> methods" (like String >> asUrl) into a DeltaModule inside my Module,
> and the unsuspecting user who imports my module is still surprised
> that it "damages" other classes like String, or Form, or whatever.

I don't agree at all. DMs not only collect all external modifications in a
special place, instead of mixing it with the "proper" contents of your
module. They also associate such modifications with the modules where they
are made, so that you by looking at the list of delta modules can see
exactly what parts of the system a certain package modifies. Any
"unsuspecting user" would most probably look at what parts of the system a
module modifies before they use it. In this way this is made easy. Can you
suggest any way to make it clearer?

This also serves to make it very clear to a conflict resolver in what parts
of the system conflicts arise with other loaded packages: conflicts need
only be handled in those modules where multiple sources want to have
modified versions of the same module.

> Now, I had assumed up until this morning that a Module could contain
> not only whole classes, but also "class extensions", that is, groups
> of methods that could be added to existing classes, even though those
> classes might be defined in different modules.  (The String>>asUrl
> example again, which I had assumed could be part of the HTML module).

But the very purpose of modularity is to separate a system into smaller,
independent, self-contained pieces, whose definitions are not entangled in
each other. Modules really should not be allowed to modify the contents of
other modules. Allowing this would go against the very idea of
modularity--it is really a contradition in terms, more or less. It is the
same violation as if an object would be allowed to directly alter the
innards of another object, it breaks encapsulation.

>> A module
>> * will of course also have contents proper: classes, globals, etc.
>
> I notice that "proper contents" does not include methods.  Is this
> intentional?  Is this a mechanism restriction that is intended to
> make it hard for me to put the "wrong stuff" in my module, that is,
> to put in "loose methods" that change some other classes?
...
> If so, I admire
> your good intentions, but I think that you are misguided.

This is just because in ordinary Smalltalk you don't put methods anywhere
except as part of classes, this just follows that principle. You may
consider Smalltalk to be misguided when it enforces this principle, and
would then regard C++ as superior in this respect, but many would consider
this part of the essence and elegance of Smalltalk.  #Smalltalk contains
classes and other globals, but not loose methods, it's the same thing with
modules. In fact, neither DMs contain loose methods, they are put in class
objects too, at least as it is now. In fact it would be more work to support
the kind of exception you suggest, than to simply put methods in classes
like now.

Dan stated earlier that while method extensions may be supported, they
shouldn't be encouraged (something to that effect), and I agree, since it
breaks the above principles. I see it as a kind of emergency solution or
exception, a concession to realities, it is never strictly necessary I
believe--although I don't want to repeat the {} discussion here.

> Now I see that I might be wrong about this, and that Andreas' point
> of view would say that Modules shall contain _only_ whole classes,
> and DeltaModules shall contain _only_ class extensions and perhaps
> class retractions (deleting methods).

Since DMs are defined as differences in relation to a base module, per this
definition (and for superseding change sets) they strictly ought to be able
to define any possible differences in a module, however some changes are
harder to support than others. Still, adding new classes should definitely
be possible, and it is easy to support (DMs being Modules, I think they
already do). But this would probably be used when DMs are used in the role
of change sets--a package will rarely need new classes to be added to other
modules.

> The email of yours in reply to Michael Rueger also said a lot of
> interesting things about modules being namespaces (not stated on the
> Wiki) and that these namespaces do not follow the hierarchic
> structure of the module nesting (not on the Wiki either).

In both cases, the information on how names are looked up was in the first
posting I made, and was copied into the first heading listed on the Swiki,
"design principles". However, Smalltalk is the only language I can think of
where different modules share a single namespace, so I may not have made it
as clear, taking it for granted. Again, it seems to go against the very
principle of modularity, where modules' contents should be independent per
definition. So I've been taking this feature for granted whereas all
Smalltalkers obviously don't.

> The distinction between BasicModules (as I shall call them) and
> DelatModules as I now understand it is:
>

As I said above, it now seems clearer that the only difference ought to be
that a Module is self-contained, whereas DeltaModules are represented as
differences wrt a base module. However, this apparently small difference
will lead to great differences in usage patterns.

> *  BasicModules can be used only to define whole classes, whereas
> DeltaModules can be used only make additions, removals or changes to
> classes that are defined elsewhere.

Regular Modules should be complete and self-contained, as per the definition
of modularity, with e.g. complete classes, as usual in Smalltalk.

DMs should in principle be able to handle all possible differences wrt a
base module: new items, deletions, and changes.

> *  BasicModules can contain (sub) BasicModules, global specifications
> and Class specifications, but not DeltaModules.

Regular Modules can be linked to any other Module, regardless of their kind
(of course). PackageModules etc. are unnecessary.

> *  DelataModules can contain (sub) DeltaModules and change
> specifications (but not BasicModules)

A DM really ought to only specify changes to the contents, and not itself
have any submodules. If your system X modifies module A and A's submodule B,
then X should have one delta module DM(A) and one DM(B), this is for
clarity, in this way you clearly see what parts X needs to modify.

Henrik

> Projects vs Modules
> ----------------------
> I have read the stuff on the Swiki and my naive view is this:
> A Module is a published piece of code.

Right, an abstract (piece of a) program with no reference whatsoever to how
it is represented, run, or anything. Effectively a sort-of-mathematical
abstraction.

> It is not a "binary deliverable".
> But you can download/load/activate them in your environment. I think it
> is good that they are not binary and I think it would be much harder to
> build such a system with the functionality that Henrik have put in there
> if they were binary. Being non binary also makes it possible to
> manipulate these things more easily. On the other hand the repository is
> a pluggable component so that we can have different kinds of
> repositorys. For example, we could have a repository implementation that
> uses ImageSegments or something to increase loading speed.
> A Project is more of a published binary deliverable. It surely can/could
> contain one or more modules but it also contains live instances etc. It
> is in my eyes analogous to the "binary download" you see everywhere on
> the Internet - just click and go. Typically not used by developers when
> they need a module to use in their own project but more for Squeak
> endusers "surfing for content".

I would like to distinguish Projects from Components--a Component is a
concrete representation of the code in a Module, that can be dynamically
loaded and unloaded. (Components also imply dynamic hook-up and such, which
is a much smaller problem in Smalltalk than in e.g. C++, which has no
standard representation for program meta-objects. Smalltalk already has
all/most of that.) So a Component = a Module or Modules as concretely
represented in Squeak (class and compiledmethod objects, etc.).

A Project is more than just this.

> A Project is more of a published binary deliverable. It surely can/could
> contain one or more modules but it also contains live instances etc.

I assume what you read was <http://minnow.cc.gatech.edu/squeak/2058>.

Importantly, Projects have contents, as in "content provider", not just code
(but it may contain code). This is what you call live instances, and I think
this is the essential difference between Projects and Components.

What projects really are today is kind of vague, but the best notion I've
been able to come up with is "the kind of thing Alan wants kids to create
with their DynaBooks". But Alan/SqC also seems to have frequently changed
their mind about^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H refined the notion of what
Projects are.

I proposed Creation as the best name I could come up with instead of
Project, which is extraordinarily vague and generic for such a specific use,
and gives poor hints to what this "thing" is and does. A Creation is a
collection of "hand-made" objects that someone has created (with a nod to
Koestler with the colored planes). See the above page for a discussion of
this.

As I see it, as concepts, Components are layered on top of/incorporate
Modules, and Creations extend Components to include hand-made objects.
Eventually.

              _____
a Creation   (     )
              ¨¨|¨¨
              _____
Components   (     ) *      (0 or more)
              ¨¨|¨¨
              _____
Module(s)    (     ) +      (1 or more)
              ¨¨¨¨¨

> Example:
> I write a cool Morphic game including classes for general animation. The
> code is composed of a Module representing the Game itself with 2 more
> general submodules for Animation and Sound. They are not really blessed
> to go into the "base image" so I put everything in "People/gh/CoolGame".
> Ok, I upload it all onto the virtual module server (more on that later)
> so that people easily can load this module.

Yes, this is enough if all your objects can be programmatically generated by
your code.

>
> But I also set up a Project where I start the Game up with a centered
> window and I also type in a todo-list in an open Workspace. Finally I
> smack in a nice looking background image and then publish the Project
> onto Bobs superswiki AND also to another superswiki in Sweden containing
> only Morphic games (just fantasy).

These are "hand-made" (ie. non-programmatically generated) objects, which
code isn't sufficient for creating (without taking extreme measures). At
least code isn't convenient for creating them. You need the ability to
conveniently (distribute and) load _objects_. This is the added role I
assign to Creations. I think Creation also implies this having to do with
"hand-made" objects (vs. code-generated).

> So, developers typically load the module to get to the code and perhaps
> reuse my Animation module or just check out how the game was done.

Right, just to get the code, Modules are enough. Components/image segments
may be convenient (think the 1MB refactoring browser).

> End
> users typically find the project and loads that - it probably is faster
> and they get a "just point and click" experience. They get the modules
> with the project, but that is just a bonus.

Right, this to get the non-code objects too.

In general, I think Repositories should be a general facility for
conveniently storing and accessing anything Squeak-related, including e.g.
Projects/Creations, so that you can just give a path to a Project, or help
pages, or you personal preferences file, or whatever, and Squeak will be
able to locate the files for you.

> regards, Göran

So what we really need is for SqC to agree with this ;-)

Henrik

The idea behind these change sets is that they might eventually get used
to do such things as:

  - implement a local symbol table for each module (just add an instvar
called "symbols" to the Module class and have it hold onto a
SymbolSpace)
  - implement "Selector Name Spaces" by using LocalSymbols as keys in
method dictionaries
  - implement "Instance Variable Name Spaces" with a similar technique
(i.e. have a dictionary of "instvars" attached to an instance with
LocalSymbols for keys)

SymbolSpace includes an "owner" instance variable to make the lookup
from a LocalSelector to the owner of the SelectorSpace quick if needed.
Also, the refactoring might make other things (like Unicode support) a
tad easier.

- Stephen

Anyway, I admit I do not really understand what you have done. What is
different from :

(Module @ #(People)) addAssoc: (Association key: #test value: 10) export: true.

And if you have Test defined into module #(People), you can perform this :
Test>>foo
  ^test

and what you have done ?

Regards,
Alexandre

 I guess I should take a couple of steps back and explain a bit of the
motivation.  A while ago (8 or 9 months), I got interested in "selector
namespaces," which led me to a prototype implementation.  I found, much
to my surprise, that it was relatively easy to get method dictionaries
(and the tools) to work with keys other than symbols.  This in turn
allowed me to alter the compiler to generate messages that used
something other than a Symbol for the method selector (an instance of
Selector to be precise).

I took that a step further, and created a separate "selector space" for
each Morphic project.  The compiler would compile instances of Selector
from a selector space local to the project (rather than instances of
symbol).  This allowed me to code within a project and be completely
isolated from the rest of the system.  I could, for example, rewrite the
OrderedCollection>>add: method, break it, and not take down squeak.  The
code I developed within the project would happily call my broken version
of #add:, but nothing else would.  Additionally, methods that were
changed or added in my project would be entered into the method
dictionary with the local Selector as the key (as opposed to the global
symbol).  So, it was trivial to be able to determine everything about
the system (in terms of methods) that I had modified within my project
(with the notable exception of changes in class shape).  A similar
technique could even be applied to class variables.

Getting back to your question...you seem to be attempting to figure out
how symbol spaces and modules would work together.  An important thing
to understand is that the code I released has no relation to the module
system whatsoever.  Well...except that I intend to use symbol spaces to
explore new ways of isolating modules from one another (but that is not
the only potential use for symbol spaces).

The example you give looks like you're showing how to access an object
stored within the namespace of a module.  This is sort-of the opposite
of what symbol spaces allow you to do.

Your example allows: objects to have similar names, but reside in
physically separate dictionaries
Selector spaces allows: objects to have similar names, but reside in the
same dictionary (and be kept separate of course)

It's really two ways to skin the same cat in other words.  There are
situations where having physically separate dictionaries is not an
option (or at least not easy)...method dictionaries is one example of
that.  You could implement namespaces in the Smalltalk system by keeping
a single global dictionary, but making the keys be LocalSymbols (instead
of global symbols).  I don't advocate that however.

There is also this tangential issue (and Henrik's favorite example)
about what to do with methods like String>>asUrl.  There are basically
three camps:

 1) those that would have you re-write this to be "Url
class>>fromString:" and would dis-allow String>>asUrl (saying that it's
poor design)
 2) those that think String>>asUrl is only acceptable if it is generally
useful (however, there is a flaw in this approach, what if the Url
module is not loaded?  What do you do in String>>asUrl if it's in the
Strings module?)
 3) those that think String>>asUrl is natural, and should be accomodated
somehow...even in the presence of a module system

In other words, the problem is complex, and the challenge is how to
effectively deal with that complexity (remember, complexity cannot be
removed, it can only be shifted from one place to another).

I think it's a worthy goal to attempt to allow one module define the
String class, and another module to define the #asUrl method.  Is it
possible to do in a clean and simple way?  I don't know, but it isn't
clean and simple in the current Smalltalk meta-model, or the current
state of the module enhancements to the meta-model.  I guess to some
degree, it all comes down to which of the following you find more
pleasant and natural?

  'http://www.yahoo.com' asUrl retrieve

      - or -

  (Url fromString: 'http://www.yahoo.com') retrieve

So...the current implementation of Symbol spaces:

 - allows me to play with further concepts in module organization
 - allows others to play with other String and Selector enhancements (by
pushing the protocol up to abstract superclasses that allow subclassing
for polymorphic substution anywhere a String or Symbol is called for)
 - reifies the Symbol class code for managing the global symbol table
into GlobalSymbolSpace
 - allows anyone who needs it to have a separate space for symbol
canonicalization

It's hard to describe this stuff in text.  I've had the best success in
getting the ideas across when I demonstrate being able to jump between
projects, change any code anywhere without regard to whether it's going
to crash the system, or conflict with another project, capture
everything that I change in the meta-model (as opposed to change sets,
which can easily get confused), having the code in various projects be
automatically active (without needing to push or pull things in and out
of method dictionaries (which also means that you could have more than
one project active at a time in a given image).

- Stephen

This may or may not help...One way to describe what Stephen has done is to
create a distinction between "selectors values"  and "selector names" that
is analogous to the distinction between actual variables and variable
names. Just as a single variable name may be bound to multiple distinct
variables at different times and in different parts of a program so can a
single selector name be bound to multiple distinct selector values in
different parts of the program.

In Classic Smalltalk-80, selector names and selectors  value are the same
thing -- they can not be separated. Stephen's work splits the two concepts.
Selector values are the things that actually identify methods and drive
method resolution. Selector names are just the identifiers that are used
when writing a method to refer to a selector value. This binding between a
selector name and a selector value may be context sensitive.

For those of you who aren't already familiar with it, the second half of my
"Smalltalk Subsystems" proposal
(http://www.smalltalksystems.com/publications/subsys.pdf) describes various
interesting things that can be done once you make a distinction between
selector values and selector names.  (starts on page 6 with the section
titled "Message Selector Conflicts".

My major reservation about this approach has always been a concern about
understandability and usability. While the distinction between a name and
its binding as normal and obvious for compiler writers like me it's not so
clear that the distinction is all that obvious to many Smalltalk
programmers.  Symbol==method selector==method lookup key is a nice concrete
concept that is easy to explain. Adding the extra level of indirection
probably add an "order of magnitude" to the conceptual complexity of this
part of the language.

Allen Wirfs-Brock

 Yes, your (Allen Wirfs-Brock's) paper does a much better job of
explaining it.  I would have liked to have seen an example that dealt
with the issues of class extensions a little more closely.  The security
example seems less useful to me.

Let me re-iterate what symbol spaces is and is not:

  It is:
    - a refactoring of String and Symbol to allow subclassing those
protocols
    - a refactoring to replace the global symbol table code with the
SymbolSpace and GlobalSymbolSpace classes
    - the introduction of LocalSymbol designed to work with an instance
of SymbolSpace
  It is not:
    - an enhancement to the module system
    - an implementation of selector spaces
    - a change to the compiler or vm

To me, this code represents something that is digestable and feels right
without getting into the thorny issues of modularization.  In other
words, it's an enhancement that could allow for experimentation in those
directions, but doesn't itself do anything in those areas.  I definitely
like the fact that the symbol table management is handled by a class
(instead of a collection class side methods), and that it's possible to
have more than one of them.

Now, getting back to the fun topic of "Selector Spaces" and
modularization:

Allen said:
> My major reservation about this approach has always been a
> concern about
> understandability and usability. While the distinction
> between a name and
> its binding as normal and obvious for compiler writers like
> me it's not so
> clear that the distinction is all that obvious to many Smalltalk
> programmers.  Symbol==method selector==method lookup key is a
> nice concrete
> concept that is easy to explain. Adding the extra level of
> indirection
> probably add an "order of magnitude" to the conceptual
> complexity of this
> part of the language.

As someone famous once said, you cannot remove complexity, you can only
shift it from one place to another.  We deal with this complexity
everyday.  Anyone who's ever tried to maintain a significant body of
code in Squeak from one version to another knows this complexity.  So
too the people who've tried to strip the image down to some fundamental
core plus their application.  The trick is to relieve our brains of this
burden and design an environment that handles this complexity for us.

I believe that this issue can be tackled without any loss in
understandability or usability.  It will take a combination of
meta-model changes, potential vm changes, and tool enhancements.  And,
the tools are very important.  The example I like to give to illustrate
this fact is that it's possible to develop code in Smalltalk using only
inspectors.  However, being successful in developing software in that
way would require that you cope with an enormous amount of complexity.
The point is that the class browser (a tool) is much more effective at
conveying the underlying structure of the Smalltalk meta-model than the
inspector.  Thus, the tools are very important (which btw, is a point
that shouldn't be neglected when thinking about how to make the modules
more usable too).

Now for some gory (but fun) implementation stuff.  A thought recently
(last few hours) occurred to me about binding selector names to selector
values with the compiler.  I was thinking through the issue of local
symbols masking out global symbols.  The issue is that I would want the
compiler to generate local symbols in some circumstances, and globals in
others.  One way to solve this is to introduce new syntax (ick).
Another way is to only generate locals when the local is defined in the
current module, or in an imported module (or one of the imported
modules' imported modules).  However, the later solution has the
following problem:

  Suppose that ModuleA extends OrderedCollection by adding a new #add:
method.
  This new method is entered in the method dictionary with a local
symbol.
  Now suppose some code in ModuleA sends the #add: method to a
LinkedList.
  The compiler compiles in the local symbol for #add:.
  LinkedList is not a subclass of OrderedCollection.
  MessageNotUnderstood!

The thought that I had is that rather than compile in a direct reference
to a selector value, we instead compile in an Array of selector values
ordered from most preferable (the local selector) to least preferable
(the global).  Method lookup in the VM would be altered to accommodate
the new lookup semantics and the global #add: would be found for the
LinkedList.

Now for a slight complication.  A subclass of OrderedCollection has
overridden the #add: method (in the global symbol space).  What do we
do?  A) invoke the global version of #add: for the subclass (knowing
that a super send in that method would not invoke our local #add:
method) or B) invoke the inherited local #add: without regard to the
subclass' global #add:?

My sense is that B is the more correct thing to do if you provide a
facility to explicitly invoke the global version of #add: from within
the local implementation of #add: and if the tools can alert you when
you're localizing an existing method that has global overrides in
subclasses.  This is the area where tools can make all the difference in
the world.

(The interesting thing is that none of this complexity being discussed
is new...we all deal with these issues anytime we attempt to merge or
migrate code...we just don't have a framework for understanding and
managing this complexity)

- Stephen

One of the beautiful points about Henriks design (at least the way I
see it) is that a module is clean per definition. A module doesn't contain
any of the "ugly" stuff (modifications to other classes etc) so it's a
beauty in itself. If you get the module you got everything that makes it a
piece of functionality. For embedding it into the environment you need
deltas, that extend the existing environment to cooperate with the module.
No point in "messing up" a module with all that stuff. I like this idea.

Cheers,
  - Andreas