The Chuck type inferencer is now available on SqueakMap. This is a
tool I've been working on for a few years that can answer questions
about the types of variables and expressions by searching through your
program code. For example, you can ask for the type of a variable x,
and it will go find every expression that writes into x (e.g., x :=
y), and find the types of the right-hand-sides, and so on recursively.
It's similar to what people do by hand anyway to find types, but Chuck
automates it.
Additionally, this kind of tool should be useful as a building block
for other tools. First, Chuck should be a useful as part of the
backend for for programming tools such as compilers and dead code
removers. Second, the "AutoChuck" part of it maintains a handy model
of all the system's source code along with various lookup tables like
"who writes into this variable"; this part alone should be useful to
people who want to do any sort of investigations about the source code
of the system.
If you want to play with the inferencer, you need to grab Squeak 3.7 (I
am at update 5707), install the Refactoring Browser, and then install
the Chuck package from SqueakMap. From there, you can run and read
some test cases. For serious day to day use, you then need to turn on
AutoChuck, which is a preference in the "browsing" area of the
preferences browser. This will spend 10-60 minutes filling up your
image with 75 megabytes or so of lookup tables; after that, it should
be invisible to your normal activities, and the refactoring browser
will have two new menu items available:
1. "infer type", in code panes. Select any expression and you can run
this, and it will dig up a type for you.
2. "specific senders of", in message-list panes. It acts like
senders-of but uses type inference on the message-senders in order to
reject some of the message senders as impossible.
More serious users may want to tweak the "preferences" class methods of
ChuckInferencer; you can tweak how many goals the inferencer will use
and how much wall-clock time it will spend before bailing out. (I
intend to extend the ChuckConfigurator dialog to let users set these
things in a GUI, but that hasn't happened yet.)
The project home page is:
http://www.cc.gatech.edu/~lex/chuck/
This page includes a good description of the algorithm, although more
description -- about 100-150 pages more -- is on the way by next May.
A quick word on what this type inferencer is giving you. The types it
infers are guaranteed bounds on what the variables or expressions will
evaluate to at runtime. They do not represent what the programmer
intended to be possible for the variable, and they are not necessarily
a good summary of what the variable is.
An example should help. If you infer a type of PlayingCardDeck's
stackingPolicy variable, Chuck will tell you that it will hold one of
these things: nil, #stagger, #straight, #altStraight, or #single. The
comment of the variable tells a slightly different story! First, the
comment gives #none as an option while the type inference does not; the
reason is that nothing in the image uses #none, even though the code
can handle it. Second, the comment does *not* mention #stagger as an
option. #stagger does indeed get assigned to the variable, so the code
and the comment are out of sync here. The first item is more
interesting I think: for this tool to be effective, you want to have
uses of the code loaded in addition to the code you are studying.
Finally, some people may be curious about the general approach of this
thing. There is some documentation on the web site, and some more
serious documentation being developed, but let me give a quick summary
because Chuck is a little unusual. Essentially, Chuck posts goals to
itself and then tries to solve them. In the course of solving a goal,
it is allowed to post more goals. Each goal is initially given a very
optimistic tentative solution (e.g., this variable is never assiged any
value), and then the algorithm repeatedly updates the tentative
solutions, making them less precise, until it has found an entire set
of solutions that are consistent with each other. At that point the
tentative solutions can be treated as full solutions.
This is different from other algorithms I know of, which do one of two
things: they simulate the program in the abstract, using types instead
of values, or, they generate a huge pile of constraints ("type of x is
greater than type of y"...), and solve them all. What Chuck's approach
allows it to do, however, is to *prune* its subproblems. That is,
Chuck can at any time answer a goal with a trivial but poor answer; the
subgoals of such a goal then become unnecessary and can be abandoned.
If Chuck is trying to find the type of "node isNil", and it is unable
to infer the type of "node", it has the option of substituting
"Anything" for the type of node and then proceding onwards (in this
case, still finding the optimal solution of {True False}). In short,
Chuck's architecture lets its performance degrade gracefully as
programs get large. The existing algorithms seem to get into trouble
with around 30-50 kloc programs, whereas Squeak has about 300 klocs.
Okay, I guess it worked it my from-scratch image because it grabbed
stuff from the same cache directory.
One way that seems to work is to open a Monticello Browser, then do
+Repository, and then fill in the repository info, which is:
MCHttpRepository
location: 'http://kilana.unibe.ch:8888/Chuck'
user: 'ls'
password: ''
After that, installing from SqueakMap will work.
> information which could then be used by several tools, such as a
> type inferencer,
> an the toy I was doing, which was collecting run-time types, so they
> could "collaborate".
Oh yes, this is a great tool too, and it complements what Chuck does.
Chuck examines the program and gives you an upper bound on what the
types might ever be; your tool examines executions and thus finds a
*lower* bound on the types. Your tool is, I'm sure, consistently fast;
on the other hand, Chuck will find a good many answers even in code
that has yet to be executed. This is nice if you have not yet figured
out how to run the code you are studying!
Also, upper bounds on the types have different applications in
downstream tools such as compilers and dead code removers; a lower
bound can inform the compiler that a case is worth optimizing (e.g.
this method really does get run with FloatArray's), and an upper bound
can let the compiler *avoid* dealing with certain cases (e.g., this
message send can only ever invoke one particular method, so inline it).