Lightweight fibres for TruffleRuby.

1. Disclaimer
2. Lightweight threads, continuations, and Project Loom.
3. Fibres and continuations in Ruby
- 3.1. Fibres
- 3.2. Continuations
4. The differences between these two models
5. Implementing Ruby's model using Loom's
- 5.1. Implementing Ruby fibres
  - 5.1.1. With Loom's Fibres
  - 5.1.2. With Loom's Continuations
- 5.2. Implementing callcc
6. The results
- 6.1. How many fibres can we support?
- 6.2. Scheduling between fibres
7. Conclusion

The MRI Ruby implementation supports heavy weight threads which are pre-emptively scheduled, and lightweight fibres which may run on those threads and are co-operatively scheduled. Fibres have been a problem for JVM Ruby implementations because the VM itself did not have a concept of a fibre, or a lightweight abstraction we could build upon, but that's starting to change with Project loom, let's take a look at what it gives, and how we can use that to implement fibres that can scale beyond even MRI's.

1 Disclaimer

Everything I"m talking about here is based on early prototypes. The APIs and features may change radically before anything nears being production, and there is no fixed timetable for when these feature will arrive in a production JVM except, "When it's done." I contributed the support for Graal to Loom, and implemented the prototype we'll talk about here for TruffleRuby.

2 Lightweight threads, continuations, and Project Loom.

Project Loom is an OpenJDK project with the goal

To explore and incubate Java VM features and APIs built on top of them for the implementation of lightweight user-mode threads (fibres), delimited continuations (of some form), and related features, such as explicit tail-call.

That may sound complex, but let's look at it bit by bit and see what it gives us in Java, and how we can use that in TruffleRuby.

2.1 Lightweight user mode threads

These are currently called java.lang.Fiber in the project loom prototype, and they provide a nice simple model. Fibres are started with a scheduler, and whenever the has nothing to do it parks itself. When the scheduler decided that that fibre has more work it can do then it will schedule it next. The scheduler isn't special, it's a standard java.util.concurrent.Executor and it can use whatever scheduling strategy it likes, across as many threads as it likes. If your fibres are mostly doing IO, or waiting for locks then you probably won't even have to park them explicitly as the Java class library will do that for you.

2.2 Continuations

Continuations are a lower level concept upon which those lightweight threads are built. A continuation can be run, and can yield control back to the function which ran it. If it is run again then it will continue from the point at which it yielded with the same call stack, local variables, and so forth.

2.3 What makes these features more light weight than `java.lang.Thread`?

A normal java.lang.Thread in Hotspot is a normal operating system thread, it has a stack, and will be scheduled by the operating system, and in the JVM it will include data structures for things like thread local variables. Scheduling threads is heavyweight because it has to happen via kernel space and because the kernel only has a very limited idea of which thread it would be sensible to schedule next. Lightweight user mode threads and continuations can involve much less data since we only need to preserve the part of the stack between the call to run the continuation and the point at which it yielded, and can often be very effectively scheduled since the application may often know precisely what to run next.

3 Fibres and continuations in Ruby

Ruby has its own model of fibres and continuations, and they aren't quite the same as Loom's

3.1 Fibres

Ruby has the Fiber class. You create a Fiber with a block like this

f = Fiber.new do |x|
  puts "Fiber called with #{x}"
end

and call it like this

f.resume(1)

which should produce Fiber called with 1 as output. Fibres can explicitly transfer control to another fibre, or yield to the fibre which transferred control to them.

3.2 Continuations

Continuations in Ruby are created using the Kernel#callcc method. This takes a block which will be executed immediately, and create a continuation object which when called will resume execution from the end of the block. They are quite hard to get your head round, so an example is probably helpful. I've adapted this one from the Ruby documentation on continuations.

require "continuation"
def f
  arr = [ "Freddie", "Herbie", "Ron", "Max", "Ringo" ]
  cont = nil
  callcc do |cc| 
    puts 'Creating continuation'
    cont = cc
  end
  puts(message = arr.shift)
  cont.call unless message =~ /Max/
end

If you run this Ruby method it will output the following

Creating continuation
Freddie
Herbie
Ron
Max

What's going on here? Well the method creates an array arr, and a variable cont. It then calls callcc which executes the block, passing in the continuation object as an argument. The block prints out that we're creating a continuation, and assigns it to cont so we can use it later. We then remove the first element of arr and assign it to message and output it. Finally we call the continuation again unless message matches "Max", and this causes the latter half of the method to be run again.

There's quite a lot of debate over whether continuations like this are a good idea, and they certainly aren't widely used in production code, but they are still part of the standard Ruby implementation and we do support them in TruffleRuby.

4 The differences between these two models

As you can see these models differ in a couple of important ways.

In the case of fibres Loom places the responsibility for scheduling on an object outside of the fibres themselves, while Ruby allows each fibre to either explicitly transfer control to another or yield to whichever fibre transferred control to it.

Continuations are even more marked in their differences. Loom's behave in many ways more like Ruby's fibres, with each call and yield advancing the execution state.

5 Implementing Ruby's model using Loom's

So given the quite different nature of these two models can we implement Ruby's fibres and continuations using what Loom provides?

5.1 Implementing Ruby fibres

Most of the code in TruffleRuby concerned with fibres is in a small set of classes, mainly FiberManager which handles the creation of fibres and how control is passed between them. This makes it quite easy for us to prototype new implementations.

5.1.1 With Loom's Fibres

Could we simply replace the use of threads in FiberManager with fibres? Well, we'd need a scheduler that could cope with both the explicit yielding from a Ruby Fiber and other parts of the Java class library park fibres for their own reasons. This is certainly an approach worth investigating, but Loom's fibre API is one area most likely to change, so we'll leave that attempt till later.

5.1.2 With Loom's Continuations

Loom's continuations seem like a much better fit. The FiberManager can represent Ruby Fiber with a java.lang.Continution, and orchestrate the transfer of control between them. Execution can process roughly as follows:

Creating a Fiber
This should create a java.lang.Continuation which will do the following:
1. Fetch any arguments given to the fibre.
2. Set itself as the current fibre.
3. Run the provided code block.
4. Clean up and tell the FiberManager what should be run next.
The fibre won't be run immediately, that can happen when Fiber#resume is called.
Transferring control from the thread into a Fiber

The FiberManager needs to set any arguments to the fibre, and then run the continuation representing it. When control returns back to the FiberManager it should check what to schedule next.
Transferring control from one FIber to another.

The running fibre should set the next fibre to be run on the FiberManager, set any arguments for it, and then yield. Control will pass back to the FiberManager which should run the continuation for the scheduled fibre. That fibre will then set itself as the current fibre on the FiberManager, and get any arguments.
Transferring control from a Fiber to the thread.

This is much like the above case, except the FiberManager will not run another continuation, instead it will simply return the results to its original caller.

5.2 Implementing `callcc`

callcc is harder to map to Loom's fibres or continuations, at least at the moment. If the continuations can be cloned however then it could be done roughly as follows. Each heavyweight thread would need to be run almost entirely inside a java.lang.Continuation, and the control flow would go like this:

Creating the continuation.
1. Create a Ruby Continuation object.
2. Execute the block passed to callcc passing the Ruby continuation in.
3. Place the Ruby continuation somewhere the thread can access it.
4. Yield control to the thread.
5. The thread clones the main java.lang.Continuation and sets it on the Ruby continuation object.
6. The thread returns control to the original java.lang.Continuation.
Calling the continuation.
1. Place the Ruby continuation somewhere the thread can access it.
2. Yield control to the thread.
3. The thread clones the java.lang.Continuation on the Ruby continuation.
4. The thread returns control to this new cloned continuation.

6 The results

So, having implemented a prototype for TruffleRuby the question is, how well does it work? The tests we can run at the moment are a little limited as the Loom prototype has some limitations, but it's enough to test a few things.

6.1 How many fibres can we support?

It's very easy to support a large number of fibres if they are simply initialised but never used, so we want a test that will create a large set of fibres which are active. Something like this should do.

def test_fiber_1(t)
  Fiber.new do |x|
    if x > t
      Fiber.yield x
    else
      f = test_fiber_1(t)
      f.transfer( x + 1 )
    end
  end
end

Using this code I ran some tests to see how many fibres could be supported by different implementations. This was done by making an initial guess and then performing a binary search to find the approximate number of fibres that could be supported without raising an error, or causing the VM to crash completely. MRI 2.5 has made changes to its fibre code and can now support a larger number of fibres. I'll update these results with figures for MRI 2.5 soon.

Ruby implementation	Fibres
MRI 2.4	30000
JRuby 9.1.11.0	3000
TruffleRuby (threads)	3000
TruffleRuby (continuations)	1000000

These results are very encouraging. The only real limit to the number of fibres we can support appears to be memory.

Testing the performance of this is not easy with JRuby or TruffleRuby using threads as neither can clean up threads fast enough to run the test multiple times with large numbers of threads. We can however compare TruffleRuby with continuations to MRI 2.4. The test was run with 10000 fibres 20 times fibres to allow for warm up, and then another 20 times to collect timings.

Ruby implementation	Average time (s)	Standard deviation
MRI 2.4	0.49	0.032
TruffleRuby (continuations)	0.20	0.064

6.2 Scheduling between fibres

Another thing to test is how fast it is to transfer control from one fibre to another. We'll use a very small test which will be completely dominated by the transfer time. Here is the source for this test.

def create_fibers(t)
  fibers = []
  0...t.times do
    fibers << Fiber.new do |x|
      while (true)
        f = fibers[rand(t)]
        Fiber.yield(f)
      end
    end
  end
  fibers
end

def schedule_fibers(t, n)
  fibers = create_fibers(t)
  next_fiber = fibers.first
  0...(n * t).times do
    next_fiber = next_fiber.resume
  end
end

This will create t fibres and transfer control among them t * n times. I had hoped to test this with large numbers of fibres, but again encountered issues with running the test using JRuby or TruffleRuby with threads, so tested with t = 100 and n = 1000

Ruby implementation	Average time (s)	Standard deviation
MRI 2.4	0.26	0.010
JRuby 9.1.11.0	1.26	0.046
TruffleRuby (threads)	1.32	0.082
TruffleRuby (continuations)	2.01	0.31

As you can see, MRI beats all the other implementations easily, and Loom's continuations still have a lot of work to be done on them performance wise. Some experiments with varying the values of t and n suggests that MRI's time is mostly spent creating the fibres, and it's very quick to transfer control between them. All other implementations are dominated by the time to transfer control between fibres and have run times mostly unaffected by the total number of fibres created (at least within the range I could test).

7 Conclusion

Project Loom allows TruffleRuby to support a large number of fibres in a lightweight way, There's still a lot of performance work to do, but these techniques look promising could also be used by JRuby and other JVM languages to support fibres, continuations, and other lightweight concurrency models.