Squeak SmalltalkJoker Squeak Smalltalk : Persistency : prevnext Magma Installation Concurrency Reliability

Below is part 1 of  my notes from installing and using the most recent 
version of Magma this morning.  They are literally just a log of what I am 
doing and what successes and failures I come across, so they will probably 
be boring for the casual reader, but I hope they provide you with some 
useful info.

Started by installing the Magma server package into a fresh 3.7b-5969 
image, and then saving two copies: magmaserver.image and 
magmaclient.image.  Both are running on Squeak 3.7.1beta2 Carbon VMs on 
Mac OS X 10.3.

Following the instructions at http://minnow.cc.gatech.edu/squeak/2689, I 
executed the following code in a workspace:

MagmaRepositoryController
	create: 'magma/myrepos.magma'
	root: Dictionary new
	
No errors.

I then tried this:

MagmaServerConsole new
	open: 'magma/myrepos.magma'
	processOn: 51969

And got the error "magma/myrepos.magma" not found.

Ok, so creating the repository didn't work.  Maybe I need to create the 
'magma' directory myself first?  If so, I would have expected the error 
  to occur when trying to create the repository, not while trying to 
open it, but let's try that.

Saba:~ avi$ cd Documents/Squeak
Saba:~/Documents/Squeak avi$ mkdir magma

Now trying this code snippet again:

MagmaRepositoryController
	create: 'magma/myrepos.magma'
	root: Dictionary new

And I get the following error:

MessageNotUnderstood: UndefinedObject>>binary

This is inside MaObjectFiler>>createDataFile.

Ok, maybe I screwed things up the first time.  Looking around for a way 
to reset the state, I find MagmaRepositoryController class>>initialize, 
and try it.  Nope, doesn't fix it.  Ok, I'll trash this image and start 
again.

 From a fresh image, the same problem.  Maybe if I create the file first?

Saba:~/Documents/Squeak avi$ touch magma/myrepos.magma

No dice.

Just as a sanity check, I try "FileStream fileNamed: 
'magma/myrepos.magma'" and get back a working filestream.  Poke around 
in the code...

<several minutes later>

The issue seems to be a strange interaction between MaFilename and the 
Carbon VM.  Although FileStream knows how to expand 
'magma/myrepos.magma' into a suitable absolute path for Carbon, 
MaFilename doesn't (it tries to expand it into a unix filename 
instead).  If I use FileStream to give me back the absolute path, and 
then feed that to Magma, things work.  Ok, we now have a server up and 
running.  Good.

I'll send this before I continue, and just make the point (I hope, 
constructively) that this is part of what I mean by "too much code": 
Magma uses its own Filename class instead of FileStream.  Now, 
FileStream has its issues, and I can perfectly understand the 
temptation to do that.  However, FileStream is known to work, on all 
platforms, on all VMs - because if it didn't, people would complain 
very quickly.  Because Magma rolls its own filename handling, it 
doesn't get to benefit from this, and in fact, MaFilename is broken on 
the platform I happen to be on.  Incidentally, this is one of the 
problems I ran into when trying out an early version of Magma (that 
time, on Linux, I believe), and it's interesting to see that it's still 
there.

Ok, now that the server is up and running, switching to the client 
image.  I create a session with:

  mySession := MagmaSession
     hostAddress: #(127 0 0 1) asByteArray
     port: 51969.
   mySession connectAs: 'avi'

And follow the instructions on the wiki to commit something to the 
root.  No errors.
I create a second session and try to pull my data out of the root.  
Success!  Great.  That was easy.

I try modifying the root in mySession2, and then inspecting it in 
mySession.  The change came across.  Cool.

Ok, let's try some concurrency.

	mySession begin.
	mySession root at: 'test' put: 7.
	mySession2 commit: [mySession2 root at: 'test' put: 8].
	mySession root at: 'test'  "==> 7"

Good - we're inside a transaction so we don't see the change from 
mySession2.  What if we try to commit?

	mySession commit.

No errors... what happened?
	
	mySession root at: 'test' "==> 8"

Ok... so the commit clearly failed, due to the conflict with 
mySession2, which is what I would expect.  But is there a way to get a 
notification of this?  I would probably want to retry the transaction 
when this happened.

Browsing through the MagmaSession protocol looking for a way to get 
this, nothing jumps out at me.  Chris?

Next, try some performance tests.  Commit a medium sized 
OrderedCollection:

	Time millisecondsToRun: [mySession commit: [mySession root at: 'test2' 
put: ((1 to: 1000) collect: [:i | i at i])]]

Executes in 2.4s.  That's pretty good.  What about to retrieve it?

	Time millisecondsToRun: [mySession2 root at: 'test2']

Hm... very strange things are going on.  This seems to fail silently - 
I never get a number back.  Inspecting mySession2 root shows only 
'test' as a key.  #basicInspect shows the same thing.  I try 
"mySession2 root at: 'test3'" and get an emergency evaluator, then try 
it again and get the same silent-failure behavior.  This doesn't bode 
well.

Ok, let's try this with a third session.  Aha, that worked: 621ms to 
retrieve.  Try it again, expecting it to be instant (since the objects 
should be cached now), but get ~300ms every time.  Interesting - why is 
that?
I'm also curious how many of the objects in the collection actually got 
brought in during that 621ms.  Let's try to force them all in:

Time millisecondsToRun: [(mySession3 root at: 'test2') do: [:ea | ea 
yourself]]

Again, about 300ms each time.  Ok, I clearly don't understand the 
caching model here.

Since I'm doing some timings, I might as well compare this to GOODS.  
Start up a GOODS image...

Time millisecondsToRun: [db root at: 'test2' put:  ((1 to: 1000) 
collect: [:i | i at i]). db commit]

1.6s.  Not too different.  And retrieval?

Time millisecondsToRun: [db2 root at: 'test2']

GOODS gets 15ms on the first run, 0ms thereafter.  Again, let's force 
it to bring everything in:

Time millisecondsToRun: [(db2 root at: 'test2') do: [:ea | ea yourself]]

727ms the first time, 3ms thereafter.  Two things we seem to be seeing 
in GOODS but not in Magma: not bringing in all the objects until 
they're needed (it only took 15ms to bring in the collection itself, 
the big hit wasn't until accessing its members), and caching the 
objects once they're there.  Chris, how could I get Magma to do the 
same things, if I wanted to?

Ok, that's enough for part II - time to grab some lunch.  I especially 
want to know what happened to mySession2, which seems to be permanently 
hosed now.

(My continuing mission to explore new databases, to seek out strange 
bugs, to... never mind).

Ok, the next thing I'm interested in is reliablity.  IIUC, Magma 
doesn't have a transaction log, which means that reliability is 
definitely a worry: is it possible for me to corrupt the database?  How 
easy is it to lose data?

I'm starting a new session, and I'm going to leave it in a loop 
committing the current time as quickly as it can:

[[mySession commit: [mySession root at: 'now' put: Time now]] repeat] 
fork

Actually this is interesting - what happens when I try to read that 
time, while the commit loop is running?  Create a new session, try to 
access that key:

mySession2 root at: 'now'

Hmm, back to the silent-failure issue from part II.

Create a third session, same problem.

Well, hm, what do I do now?  I guess I can halt the ever-committing 
process and see what happens then.

Open the ProcessBrowser to try to kill the process (next time, keep a 
reference to it in the workspace), and the only one that seems right is 
currently in UndefinedObject>>handleSignal:.  Maybe we hit an error 
while trying to commit?  Try to debug the process, my image hangs.  Ok, 
trash that one.

New client image, open a new session, which seems to work - the server 
survived.  That's good.  But there's nothing at 'now' in the root.  Did 
*none* of those commits work?

Let's try that again with a delay between commits.

[[mySession commit: [(Delay forSeconds: 1) wait.  mySession root at: 
'now' put: Time now]] repeat] fork

Same problem: trying to access 'now' doesn't work.
Well, maybe I've screwed up the root object somehow.  Doesn't seem to 
be a way to reset the root, so I'll start a new repository on the 
server.

Start a new repository, start a server console going, try to repeat the 
test, hit the exact same issue.  I wonder - are there issues with 
having concurrent client sessions in different threads?  Instead of 
forking, let's try repeating this commit some finite number of times...

1000 timesRepeat: [mySession commit: [mySession root at: 'now' put: 
Time now]]

Ok, now try to access it:

mySession2 root at: 'now'

Oops, same problem.  Create a new session?

mySession3 root at: 'now'

Ok, that worked.

Now, let's try killing the server while that repeating commit is going 
on.

1000 timesRepeat: [mySession commit: [mySession root at: 'now' put: 
Time now]]

First gently: I'm saving and quitting the server image.  Some notifiers 
popped up about SharedQueue not being empty, but the image quit anyway.

Just out of curiosity, try to use the client session while the server 
is down.  My image locks up.  After a while, I manage to interrupt it, 
inside a critical section in MaTcpRequestServerLink>>submit:.

Ok, I bring the server image back up, close the notifiers, and start 
the console going again.

Try to connect from the client, but it locks again.  Do I need to 
manually signal that semaphore?

Try that, but still can't connect.  Let's switch to a new client image 
too.

Ok, that took quite a while, but it did connect.  Try to inspect the 
root, but all I get is the silent-failure thing again.

If I look back at the server image, I've got an MNU for UndefinedObject 
of
#maRead: bytes:bytesFromPosition:of:atFilePosition: .  Looks like the 
file is nil?  But it does exist:

Saba:~/Documents/Squeak/magma avi$ ls -l
total 192
-rw-r--r--  1 avi  staff  44592  9 Jul 12:40 myrepos.magma
-rw-r--r--  1 avi  staff  53184  9 Jul 14:21 myrepos2.magma

Hm... maybe starting up the server console on restart was a bad idea, 
maybe it does that for me and I should leave it alone.  Quit the server 
image, start it up again.  Use a fresh client image too.

Nope, same problem.  Maybe I hosed the server image by saving and 
quitting it?  I'll try a brand new server image and use the same 
database file.

Ok, that seems to work, and I get the root back ok.  Lesson learned: do 
not save and quit a running magma server.

Next question - what happens if I kill the process instead?  Start the 
repeating commit going, then force quit the server.
Start up a new server image.  Trying to connect from the old client 
image hangs; start a new client image too.

Ok, can connect, but can't get the root - same old strange 
silent-failure business.  Now I'm really stuck - this is a fresh server 
image and a fresh client image, and all I've tried to do is connect and 
look at the root.  Is my data gone forever?

Well, I think that's about as far as I'm willing to go today.  Frankly, 
it's a lot further than I would go if I were evaluating Magma for use, 
rather than trying to give as much feedback as possible - I hit enough 
issues along the way that I would have long since lost the ability to 
muster the 110% confidence I would need to entrust my data to it.  But 
I recognize that it's a work in progress, and so I hope my notes prove 
to be useful in taking it further.

(My continuing mission to explore new databases, to seek out strange 
bugs, to... never mind).

Ok, the next thing I'm interested in is reliablity.  IIUC, Magma 
doesn't have a transaction log, which means that reliability is 
definitely a worry: is it possible for me to corrupt the database?  How 
easy is it to lose data?

I'm starting a new session, and I'm going to leave it in a loop 
committing the current time as quickly as it can:

[[mySession commit: [mySession root at: 'now' put: Time now]] repeat] 
fork

Actually this is interesting - what happens when I try to read that 
time, while the commit loop is running?  Create a new session, try to 
access that key:

mySession2 root at: 'now'

Hmm, back to the silent-failure issue from part II.

Create a third session, same problem.

Well, hm, what do I do now?  I guess I can halt the ever-committing 
process and see what happens then.

Open the ProcessBrowser to try to kill the process (next time, keep a 
reference to it in the workspace), and the only one that seems right is 
currently in UndefinedObject>>handleSignal:.  Maybe we hit an error 
while trying to commit?  Try to debug the process, my image hangs.  Ok, 
trash that one.

New client image, open a new session, which seems to work - the server 
survived.  That's good.  But there's nothing at 'now' in the root.  Did 
*none* of those commits work?

Let's try that again with a delay between commits.

[[mySession commit: [(Delay forSeconds: 1) wait.  mySession root at: 
'now' put: Time now]] repeat] fork

Same problem: trying to access 'now' doesn't work.
Well, maybe I've screwed up the root object somehow.  Doesn't seem to 
be a way to reset the root, so I'll start a new repository on the 
server.

Start a new repository, start a server console going, try to repeat the 
test, hit the exact same issue.  I wonder - are there issues with 
having concurrent client sessions in different threads?  Instead of 
forking, let's try repeating this commit some finite number of times...

1000 timesRepeat: [mySession commit: [mySession root at: 'now' put: 
Time now]]

Ok, now try to access it:

mySession2 root at: 'now'

Oops, same problem.  Create a new session?

mySession3 root at: 'now'

Ok, that worked.

Now, let's try killing the server while that repeating commit is going 
on.

1000 timesRepeat: [mySession commit: [mySession root at: 'now' put: 
Time now]]

First gently: I'm saving and quitting the server image.  Some notifiers 
popped up about SharedQueue not being empty, but the image quit anyway.

Just out of curiosity, try to use the client session while the server 
is down.  My image locks up.  After a while, I manage to interrupt it, 
inside a critical section in MaTcpRequestServerLink>>submit:.

Ok, I bring the server image back up, close the notifiers, and start 
the console going again.

Try to connect from the client, but it locks again.  Do I need to 
manually signal that semaphore?

Try that, but still can't connect.  Let's switch to a new client image 
too.

Ok, that took quite a while, but it did connect.  Try to inspect the 
root, but all I get is the silent-failure thing again.

If I look back at the server image, I've got an MNU for UndefinedObject 
of
#maRead: bytes:bytesFromPosition:of:atFilePosition: .  Looks like the 
file is nil?  But it does exist:

Saba:~/Documents/Squeak/magma avi$ ls -l
total 192
-rw-r--r--  1 avi  staff  44592  9 Jul 12:40 myrepos.magma
-rw-r--r--  1 avi  staff  53184  9 Jul 14:21 myrepos2.magma

Hm... maybe starting up the server console on restart was a bad idea, 
maybe it does that for me and I should leave it alone.  Quit the server 
image, start it up again.  Use a fresh client image too.

Nope, same problem.  Maybe I hosed the server image by saving and 
quitting it?  I'll try a brand new server image and use the same 
database file.

Ok, that seems to work, and I get the root back ok.  Lesson learned: do 
not save and quit a running magma server.

Next question - what happens if I kill the process instead?  Start the 
repeating commit going, then force quit the server.
Start up a new server image.  Trying to connect from the old client 
image hangs; start a new client image too.

Ok, can connect, but can't get the root - same old strange 
silent-failure business.  Now I'm really stuck - this is a fresh server 
image and a fresh client image, and all I've tried to do is connect and 
look at the root.  Is my data gone forever?

Well, I think that's about as far as I'm willing to go today.  Frankly, 
it's a lot further than I would go if I were evaluating Magma for use, 
rather than trying to give as much feedback as possible - I hit enough 
issues along the way that I would have long since lost the ability to 
muster the 110% confidence I would need to entrust my data to it.  But 
I recognize that it's a work in progress, and so I hope my notes prove 
to be useful in taking it further.

---------------------------------------

Hi Avi, I've now had time to go through all three "Magma notes" and 
thought it would be easier to summarize my findings in one note.  After 
reviewing all notes in detail and attempting to reproduce everything, the 
quick, summary answer is, I discovered no significant issues.  All the 
tests that you performed either worked for me or your expectations 
differed from how it works.

My main purpose in reporting this is to clarify that the current release, 
"1.0gamma7" truly is "gamma" quality IMO, not "alpha" or "beta" quality as 
has been characterized, at least somewhat, by the poor experience you had.

> Chris, if there are particular issues I found that you want help in
> reproducing, just let me know.

Yes, I think the next step is to try to establish reproducibility in some 
of the issues you encountered.  Some kind of straight-run script that 
demonstrates a particular problem.  My guess is, you will not be able to 
produce a script that locks the image, corrupts data, produces 
inconsistent results, or anything else really bad like that unless you go 
outside the bounds of what Magma "supports".  (I plan to add a Swiki page 
later this weekend that spells out Magma's boundaries and limitations, 
hopefully that will help clarify the air too).

> is it possible for me to corrupt the database?

Yes, there is no transaction logging, so a hardware failure in mid-write 
would leave half a transaction committed.  It is planned for the future, 
but not the top priority on my list just yet.

> How easy is it to lose data?

Actually, not too easy, as long as hardware doesn't fail.  But it is 
advisable to take some care, of course.

Also, another good "platform test" would be to run the Magma test cases on 
your Mac.  I always ensure they run on Windows before I post to SqueakMap 
so if it can't get through those, then there may be a platform-specific 
bug in Squeak somewhere.  They take a while to run, and it's not just a 
click in TestRunner browser so let me know if you need any assistance 
getting it set up.

And this also brings up a good point.  The test-cases are pretty 
stringent, covering lots of weird combinations and scenarios.  By studying 
those, you can see exactly what Magma is capable of, because it *does* get 
through them all. Ok, what follows are the boring details for each set of 
notes.

Magma notes 1:

Here the entire problem is that Magma does not support "relative path" 
file names.  There are just three or four messages in the entire API that 
call for a filename and I always use fully-qualified names.  I will add 
improving this to my list but, in the mean time, use fully-qualified names 
and all these problems should go away.

MaFilename is a facade for accessing the parts of filenames.  I hope that 
changing its implementation to rely more heavily on what FileDirectory 
(you said FileStream, I presume you meant FileDirectory) will help 
relative filenames work on all platforms.  If I've duplicated 
functionality I probably just didn't see it.  I'll research that and add 
that to my list of improvements.

Magma notes 2:

In this one, the only anomaly I was able to reproduce was the apparent 
lack of commit-conflict detection.  I was very surprised by this at first, 
but I now see why.  I had my MaClientServerPreferences debug set to true, 
which turns off the resignaling of the exceptions that occurred in the 
server (the MagmaCommitError) and simply returns them instead (hmm, now 
I'm trying to remember why I wanted that behavior..).

Please see if you, too, have yours turned on and, if so, execute this:

  MaClientServerPreferences debug: false. MagmaPreferences debug: false

and then it will signal the commit error instead of just returning it.

Magma notes 3:

Basically, everything you were doing here was simulating what you want to 
do for a web app.  I want to be clear when I say I have never used nor 
tested Magma for this purpose, but, based on how its coded, I think it 
*should* work.

 Let me try in Windows.

Ok, I just tried the test where you commit the clock continously:

  [[mySession commit: [mySession root at: 'now' put: Time now]] repeat] 
  fork

and, in the mySession2 inspector:

  [ [ self abort. Transcript cr; show: (self root at: 'now') ] repeat ] 
  fork

I let it run all night long.  This morning, both processes were still 
chugging and the server console shows #objectCount at 84302, so an average 
of about three commits per second.

> Lesson learned: do not save and quit a running magma server.

No, this works too.  In fact, I just killed the server (save and exit 
image). However, there *was* a problem killing the clients after the 
server because they automatically try to disconnect and wait for the 
server response.  I've already fixed that in 1.0beta8.  But bringing the 
server back up and then connecting with clients did work just fine.

> Ok, can connect, but can't get the root - same old strange
> silent-failure business.

There is no no silent-failure business.  If you can't get the root then 
one of the server processes must've gotten killed, so you're going to be 
flailing in a rut for anything you try going forward.

> Just out of curiosity, try to use the client session while the server
> is down.  My image locks up.  After a while, I manage to interrupt it,
> inside a critical section in MaTcpRequestServerLink>>submit:.

A ha!  You ARE in debug mode because otherwise the timeout should have 
occurred after 30 seconds.  When in debug mode it is set to 2 days so I 
have plenty of time to debug.  You are running in debug mode.

> Open the ProcessBrowser to try to kill the process

This is a good way to get Magma in a funky state and, quite possibly, 
stuck in a rut.

 - Chris