Lightweight fibres for TruffleRuby.
Table of Contents
The MRI Ruby implementation supports heavy weight threads which are pre-emptively scheduled, and lightweight fibres which may run on those threads and are co-operatively scheduled. Fibres have been a problem for JVM Ruby implementations because the VM itself did not have a concept of a fibre, or a lightweight abstraction we could build upon, but that's starting to change with Project loom, let's take a look at what it gives, and how we can use that to implement fibres that can scale beyond even MRI's.
1 Disclaimer
Everything I"m talking about here is based on early prototypes. The APIs and features may change radically before anything nears being production, and there is no fixed timetable for when these feature will arrive in a production JVM except, "When it's done." I contributed the support for Graal to Loom, and implemented the prototype we'll talk about here for TruffleRuby.
2 Lightweight threads, continuations, and Project Loom.
Project Loom is an OpenJDK project with the goal
To explore and incubate Java VM features and APIs built on top of them for the implementation of lightweight user-mode threads (fibres), delimited continuations (of some form), and related features, such as explicit tail-call.
That may sound complex, but let's look at it bit by bit and see what it gives us in Java, and how we can use that in TruffleRuby.
2.1 Lightweight user mode threads
These are currently called java.lang.Fiber
in the project loom
prototype, and they provide a nice simple model. Fibres are started
with a scheduler, and whenever the has nothing to do it parks
itself. When the scheduler decided that that fibre has more work it
can do then it will schedule it next. The scheduler isn't special,
it's a standard java.util.concurrent.Executor
and it can use
whatever scheduling strategy it likes, across as many threads as it
likes. If your fibres are mostly doing IO, or waiting for locks then
you probably won't even have to park them explicitly as the Java class
library will do that for you.
2.2 Continuations
Continuations are a lower level concept upon which those lightweight threads are built. A continuation can be run, and can yield control back to the function which ran it. If it is run again then it will continue from the point at which it yielded with the same call stack, local variables, and so forth.
2.3 What makes these features more light weight than java.lang.Thread
?
A normal java.lang.Thread
in Hotspot is a normal operating system
thread, it has a stack, and will be scheduled by the operating system,
and in the JVM it will include data structures for things like thread
local variables. Scheduling threads is heavyweight because it has to
happen via kernel space and because the kernel only has a very limited
idea of which thread it would be sensible to schedule
next. Lightweight user mode threads and continuations can involve much
less data since we only need to preserve the part of the stack between
the call to run the continuation and the point at which it yielded,
and can often be very effectively scheduled since the application may
often know precisely what to run next.
3 Fibres and continuations in Ruby
Ruby has its own model of fibres and continuations, and they aren't quite the same as Loom's
3.1 Fibres
Ruby has the Fiber
class. You create a Fiber
with a block like this
f = Fiber.new do |x| puts "Fiber called with #{x}" end
and call it like this
f.resume(1)
which should produce Fiber called with 1
as output. Fibres can
explicitly transfer control to another fibre, or yield to the fibre
which transferred control to them.
3.2 Continuations
Continuations in Ruby are created using the Kernel#callcc
method. This takes a block which will be executed immediately, and
create a continuation object which when called will resume execution
from the end of the block. They are quite hard to get your head round,
so an example is probably helpful. I've adapted this one from the Ruby
documentation on continuations.
require "continuation" def f arr = [ "Freddie", "Herbie", "Ron", "Max", "Ringo" ] cont = nil callcc do |cc| puts 'Creating continuation' cont = cc end puts(message = arr.shift) cont.call unless message =~ /Max/ end
If you run this Ruby method it will output the following
Creating continuation Freddie Herbie Ron Max
What's going on here? Well the method creates an array arr
, and a
variable cont
. It then calls callcc
which executes the block,
passing in the continuation object as an argument. The block prints
out that we're creating a continuation, and assigns it to cont
so we
can use it later. We then remove the first element of arr
and assign
it to message
and output it. Finally we call the continuation again
unless message
matches "Max", and this causes the latter half of the
method to be run again.
There's quite a lot of debate over whether continuations like this are a good idea, and they certainly aren't widely used in production code, but they are still part of the standard Ruby implementation and we do support them in TruffleRuby.
4 The differences between these two models
As you can see these models differ in a couple of important ways.
In the case of fibres Loom places the responsibility for scheduling on an object outside of the fibres themselves, while Ruby allows each fibre to either explicitly transfer control to another or yield to whichever fibre transferred control to it.
Continuations are even more marked in their differences. Loom's behave in many ways more like Ruby's fibres, with each call and yield advancing the execution state.
5 Implementing Ruby's model using Loom's
So given the quite different nature of these two models can we implement Ruby's fibres and continuations using what Loom provides?
5.1 Implementing Ruby fibres
Most of the code in TruffleRuby concerned with fibres is in a small
set of classes, mainly FiberManager
which handles the creation of
fibres and how control is passed between them. This makes it quite
easy for us to prototype new implementations.
5.1.1 With Loom's Fibres
Could we simply replace the use of threads in FiberManager
with
fibres? Well, we'd need a scheduler that could cope with both the
explicit yielding from a Ruby Fiber
and other parts of the Java
class library park fibres for their own reasons. This is certainly an
approach worth investigating, but Loom's fibre API is one area most
likely to change, so we'll leave that attempt till later.
5.1.2 With Loom's Continuations
Loom's continuations seem like a much better fit. The FiberManager
can represent Ruby Fiber
with a java.lang.Continution
, and
orchestrate the transfer of control between them. Execution can
process roughly as follows:
- Creating a
Fiber
This should create a
java.lang.Continuation
which will do the following:- Fetch any arguments given to the fibre.
- Set itself as the current fibre.
- Run the provided code block.
- Clean up and tell the
FiberManager
what should be run next.
The fibre won't be run immediately, that can happen when
Fiber#resume
is called. - Transferring control from the thread into a
Fiber
The
FiberManager
needs to set any arguments to the fibre, and then run the continuation representing it. When control returns back to theFiberManager
it should check what to schedule next. - Transferring control from one
FIber
to another.
The running fibre should set the next fibre to be run on the
FiberManager
, set any arguments for it, and then yield. Control will pass back to theFiberManager
which should run the continuation for the scheduled fibre. That fibre will then set itself as the current fibre on theFiberManager
, and get any arguments. - Transferring control from a
Fiber
to the thread.
This is much like the above case, except the
FiberManager
will not run another continuation, instead it will simply return the results to its original caller.
5.2 Implementing callcc
callcc
is harder to map to Loom's fibres or continuations, at least
at the moment. If the continuations can be cloned however then it
could be done roughly as follows. Each heavyweight thread would need
to be run almost entirely inside a java.lang.Continuation
, and the
control flow would go like this:
- Creating the continuation.
- Create a Ruby
Continuation
object. - Execute the block passed to
callcc
passing the Ruby continuation in. - Place the Ruby continuation somewhere the thread can access it.
- Yield control to the thread.
- The thread clones the main
java.lang.Continuation
and sets it on the Ruby continuation object. - The thread returns control to the original
java.lang.Continuation
.
- Create a Ruby
- Calling the continuation.
- Place the Ruby continuation somewhere the thread can access it.
- Yield control to the thread.
- The thread clones the
java.lang.Continuation
on the Ruby continuation. - The thread returns control to this new cloned continuation.
6 The results
So, having implemented a prototype for TruffleRuby the question is, how well does it work? The tests we can run at the moment are a little limited as the Loom prototype has some limitations, but it's enough to test a few things.
6.1 How many fibres can we support?
It's very easy to support a large number of fibres if they are simply initialised but never used, so we want a test that will create a large set of fibres which are active. Something like this should do.
def test_fiber_1(t) Fiber.new do |x| if x > t Fiber.yield x else f = test_fiber_1(t) f.transfer( x + 1 ) end end end
Using this code I ran some tests to see how many fibres could be supported by different implementations. This was done by making an initial guess and then performing a binary search to find the approximate number of fibres that could be supported without raising an error, or causing the VM to crash completely. MRI 2.5 has made changes to its fibre code and can now support a larger number of fibres. I'll update these results with figures for MRI 2.5 soon.
Ruby implementation | Fibres |
---|---|
MRI 2.4 | 30000 |
JRuby 9.1.11.0 | 3000 |
TruffleRuby (threads) | 3000 |
TruffleRuby (continuations) | 1000000 |
These results are very encouraging. The only real limit to the number of fibres we can support appears to be memory.
Testing the performance of this is not easy with JRuby or TruffleRuby using threads as neither can clean up threads fast enough to run the test multiple times with large numbers of threads. We can however compare TruffleRuby with continuations to MRI 2.4. The test was run with 10000 fibres 20 times fibres to allow for warm up, and then another 20 times to collect timings.
Ruby implementation | Average time (s) | Standard deviation |
---|---|---|
MRI 2.4 | 0.49 | 0.032 |
TruffleRuby (continuations) | 0.20 | 0.064 |
6.2 Scheduling between fibres
Another thing to test is how fast it is to transfer control from one fibre to another. We'll use a very small test which will be completely dominated by the transfer time. Here is the source for this test.
def create_fibers(t) fibers = [] 0...t.times do fibers << Fiber.new do |x| while (true) f = fibers[rand(t)] Fiber.yield(f) end end end fibers end def schedule_fibers(t, n) fibers = create_fibers(t) next_fiber = fibers.first 0...(n * t).times do next_fiber = next_fiber.resume end end
This will create t
fibres and transfer control among them t * n
times. I had hoped to test this with large numbers of fibres, but
again encountered issues with running the test using JRuby or
TruffleRuby with threads, so tested with t = 100
and n = 1000
Ruby implementation | Average time (s) | Standard deviation |
---|---|---|
MRI 2.4 | 0.26 | 0.010 |
JRuby 9.1.11.0 | 1.26 | 0.046 |
TruffleRuby (threads) | 1.32 | 0.082 |
TruffleRuby (continuations) | 2.01 | 0.31 |
As you can see, MRI beats all the other implementations easily, and
Loom's continuations still have a lot of work to be done on them
performance wise. Some experiments with varying the values of t
and
n
suggests that MRI's time is mostly spent creating the fibres, and
it's very quick to transfer control between them. All other
implementations are dominated by the time to transfer control between
fibres and have run times mostly unaffected by the total number of
fibres created (at least within the range I could test).
7 Conclusion
Project Loom allows TruffleRuby to support a large number of fibres in a lightweight way, There's still a lot of performance work to do, but these techniques look promising could also be used by JRuby and other JVM languages to support fibres, continuations, and other lightweight concurrency models.