- Therefore, the UI process does not wait on the delay in
WorldState>interCyclePause:
- Because the UI main loop only does a yield (see Process
class>>spawnNewProcess) the UI process therefore stays runnable all the time
as there is no other process with p = 40.
- Therefore, no process with p < 40 has a chance to be activated (only
higher ones, which we find in the trace). This also explains why we see 100%
CPU usage, but still the UI responds immediately.
This sounds like a reasonable explanation.
Now, why does moving the mouse make it run again? I have no idea... my guess
is that the triggered behavior of a mouse move event somehow forces a full GC.
In the trace we see that when the 107th full GC is done, there are much fewer
incr. GCs later on. Hence, it is much more likely that the UI process pauses
again.
Tenuring might fix it, too. And it may just be that your wiggling the mouse
creates the bit of extra garbage to make the VM tenure.
How could we fix this?
-----------------------------------------
a) Simply increase the 20ms pause defined by MinCycleLapse (at least for
production systems) or tweak the "pause logic". As a test I defined
MinCycleLapse to be 40ms. I could not reproduce the problem anymore.
>
b) In production, suspend the UI process and only turn it on again when you
need it (we do this via HTTP). This should also improve performance a bit. At
best this is a workaround.
>
c) Tune the GC policies as they are far from optimal for today's systems (as
John has suggested a couple of times). It seems, though, that this cannot
guarantee to fix the problem but it should make it less likely to happen(?).
d) Don't use processes that run below user scheduling priority. To be honest,
I'm not sure why you'd be running anything below UI priority on a server.
e) Make a preference "lowerPerformance" (or call it "headlessMode" if you wish
:^) and have the effect be that in intercyclePause: you *always* wait for a
certain amount of time (20-50ms). This will ensure that your UI process can't
possibly eat up all of your CPU.
I'd be interested in getting feedback on:
- whether the explanation sounds plausible
It does. There is however a question what could possibly generate enough load
on the garbage collector to run IGCs that take on average 7ms and run three of
them in a single UI cycle.
- whether the fix (e.g., a)) workes for other people that have this problem.
- what may be a good fix
I'd probably go with option e) above since it ensures that there is always
time left for the lower priority processes (and you don't have to change other
code). Everything else seems risky since you can't say for sure if there isn't
anything that keeps the UI in a busy-loop.