I'm going to be looking at async functions and generators over the next little while, which requires I understand them. My previous experience has been that writing about things in a structured fashion helps me learn-by-teaching. So this blog post (series?) is me learning about generators by teaching.
We'll start from the basics of the language feature, and then continue into more specifics.
Basics of Generators
Generators are special objects that return a sequence of values when you call their next()
method. The sequence of values provided is specified by code, which backs a generator.
To create a JS Generator, you use the function*
syntax:
function* gen() {
yield 1;
yield 2;
yield 3;
}
This function, when called, instead of running, returns a generator.
var g = gen();
At this point, none of the code backing the generator has run yet. Once you invoke the next
method on the generator g
, then the body of the function runs until a yield
. At the yield
point, execution of the function body stops, and the caller of next
is returned an object with value
and done
properties: value
is the argument to the yield
, and done
is false
if there is more results to be yield
ed, and true
if the generator is finished. Subsequent calls to next
will simply return {value: undefined, done: true}
.
g.next(); // {value: 1, done: false}
g.next(); // {value: 2, done: false}
g.next(); // {value: 3, done: false}
g.next(); // {value: undefined, done: true}
g.next(); // {value: undefined, done: true}
This protocol is understood by JavaScript features, like for ... of
loops:
let res = []
for (v of gen()) {
res.push(v);
}
v; // [1,2,3]
When you call .next()
it's possible to provide an argument. That becomes the value of the yield
expression when the generator is resumed.
Investigating a basic generator:
Let's look at the bytecode for a generator function. With a debug build of SpiderMonkey, you can dump bytecode with the dis
function: dis(gen)
produces the following fairly substantial chunk of Bytecode
flags: NEEDS_CALLOBJECT
loc op
----- --
main:
00000: Generator # GENERATOR
00001: SetAliasedVar ".generator" (hops = 0, slot = 2) # GENERATOR
00006: InitialYield 0 # RVAL GENERATOR RESUMEKIND
00010: AfterYield (ic: 1) # RVAL GENERATOR RESUMEKIND
00015: CheckResumeKind # RVAL
00016: Pop #
00017: NewObject ({value:(void 0), done:(void 0)}) # OBJ
00022: One # OBJ 1
00023: InitProp "value" # OBJ
00028: False # OBJ false
00029: InitProp "done" # OBJ
00034: GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ .generator
00039: Yield 1 # RVAL GENERATOR RESUMEKIND
00043: AfterYield (ic: 5) # RVAL GENERATOR RESUMEKIND
00048: CheckResumeKind # RVAL
00049: Pop #
00050: NewObject ({value:(void 0), done:(void 0)}) # OBJ
00055: Int8 2 # OBJ 2
00057: InitProp "value" # OBJ
00062: False # OBJ false
00063: InitProp "done" # OBJ
00068: GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ .generator
00073: Yield 2 # RVAL GENERATOR RESUMEKIND
00077: AfterYield (ic: 9) # RVAL GENERATOR RESUMEKIND
00082: CheckResumeKind # RVAL
00083: Pop #
00084: NewObject ({value:(void 0), done:(void 0)}) # OBJ
00089: Int8 3 # OBJ 3
00091: InitProp "value" # OBJ
00096: False # OBJ false
00097: InitProp "done" # OBJ
00102: GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ .generator
00107: Yield 3 # RVAL GENERATOR RESUMEKIND
00111: AfterYield (ic: 13) # RVAL GENERATOR RESUMEKIND
00116: CheckResumeKind # RVAL
00117: Pop #
00118: NewObject ({value:(void 0), done:(void 0)}) # OBJ
00123: Undefined # OBJ undefined
00124: InitProp "value" # OBJ
00129: True # OBJ true
00130: InitProp "done" # OBJ
00135: SetRval #
00136: GetAliasedVar ".generator" (hops = 0, slot = 2) # .generator
00141: FinalYieldRval #
00142: RetRval # !!! UNREACHABLE !!!
We'll go through this piece by piece to try to understand what's going on.
00000: Generator # GENERATOR
00001: SetAliasedVar ".generator" (hops = 0, slot = 2) # GENERATOR
Reading this, on the left we have the Bytecode Index (0000
), the opcode in the middle Generator
, and on the right, after the #
, we have the statically determined stack contents.
To understand what opcodes do, the best reference is Opcodes.h in the SpiderMonkey sources, as well as the interpreter implementation of the opcode.
These two opcodes together create a Generator object, and create a binding for the generator under the name .generator
, for future access. We use .generator
as the name because we know it will never conflict with a user defined JS name because there's no valid syntax to create that name.
00006: InitialYield 0 # RVAL GENERATOR RESUMEKIND
InitialYield
does three main things: First, it makes the Generator object, allocated above by the Generator opcode, the return value. Then, it calls AbstractGeneratorObject::initialSuspend
, after which it pops the current frame off the stack (returning to the caller). We'll discuss the suspend
operation shortly.
The contract for generators bytecode is that at the time of generator resumption the stack will be updated to have on it:
- The 'result' value of the generator (ie,
y
, in y = yield x;
). This is injected as the argument to .next(...)
.
- The generator object
- The resume kind: this indicates if the generator was resumed with
.next()
, .throw
or .return
.
00010: AfterYield (ic: 1) # RVAL GENERATOR RESUMEKIND
00015: CheckResumeKind # RVAL
00016: Pop #
AfterYield
is a book-keeping operation, which we will skip for now.
CheckResumeKind
reads the generator resume kind and either: 1) Continues to the next instruction, if the resume kind is next
, 2) Throws an exception if the resume kind is either throw
or return
. The return value is what is on the stack afterwards. Because this was an 'initial yield', there's nothing to consume this return value, so we simply pop it off the stack.
00017: NewObject ({value:(void 0), done:(void 0)}) # OBJ
00022: One # OBJ 1
00023: InitProp "value" # OBJ
00028: False # OBJ false
00029: InitProp "done" # OBJ
00034: GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ .generator
00039: Yield 1 # RVAL GENERATOR RESUMEKIND
Now that we've gotten through the code executed before the first invocation of next()
, we can see the code executed on our first call to next()
:
NewObject
allocates the return value of the generator.
One
pushes 1 onto the stack, and InitProp
sets the value
property on that object to 1.
False
and InitProp
set the done
property on the object to false
GetAliasedVar` retrieves the generator from the environment, and pushes it onto the stack.
Yield
suspends the generator, returning its argument to the caller. Following the same contract for InitialYield
, when execution resumes after this bytecode, the stack will have the argument to next/throw/return
, the generator, and the resume kind on the stack.
Seeing the above pattern, you can probably easily pick out the next three yield
s, so I will skip those.
00118: NewObject ({value:(void 0), done:(void 0)}) # OBJ
00123: Undefined # OBJ undefined
00124: InitProp "value" # OBJ
00129: True # OBJ true
00130: InitProp "done" # OBJ
00135: SetRval #
00136: GetAliasedVar ".generator" (hops = 0, slot = 2) # .generator
00141: FinalYieldRval #
00142: RetRval # !!! UNREACHABLE !!!
Once we have exhausted the yield
s, we get to the end of the generator. At this point, we prepare the {value: returnValue, done: true}
object. This is stored into the return value slot, then returned via FinalYieldRval
, which closes the generator object and then returns.
Suspend
Suspending a generator means saving the state of the stack frame so that it can be restored. The relevant state here is the state of the expression / interpreter stack.
In a slightly more complicated generator we can see this happen:
function *gen2() {
return 10 + (yield 1)
}
When we disassemble this (dis(gen2)
) we can see the relevant bits here:
00017: NewObject ({value:(void 0), done:(void 0)}) # OBJ
00022: Int8 10 # OBJ 10
00024: NewObject ({value:(void 0), done:(void 0)}) # OBJ 10 OBJ
00029: One # OBJ 10 OBJ 1
00030: InitProp "value" # OBJ 10 OBJ
00035: False # OBJ 10 OBJ false
00036: InitProp "done" # OBJ 10 OBJ
00041: GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ 10 OBJ .generator
00046: Yield 1 # OBJ 10 RVAL GENERATOR RESUMEKIND
00050: AfterYield (ic: 6) # OBJ 10 RVAL GENERATOR RESUMEKIND
00055: CheckResumeKind # OBJ 10 RVAL
00056: Add # OBJ (10 + RVAL)
The first NewObject
pushed is the one for the return value; the second is the one returned by the yield
. If you look at the yield, you can see that the return object and the literal 10 are still on the stack when we yield. These values need to be saved on suspend.
For that, the method AbstractGeneratorObject::suspend
exists. This method has three main responsibilities.
- Keeping track of the resume index: this is where the generator should start executing on resumption.
- Keeps track of the frame's environment chain. I'll discuss the environment chain in a moment.
- Copying the values out of the interpreter stack into an array stored on the generator object (possibly recycling a previously allocated array, extending it if necessary).
As near as I can tell, the major difference between InitialYield
and Yield
is that InitialYield
is made aware that a priori we know we will have nothing on the expression stack to be saved.
The environment chain needs to be maintained by the generator as it will change over the execution of a generator. For example:
function *gen_with_environment(x) {
if (x) {
var y = 10;
let z = 12;
yield 1;
yield z + 1;
}
yield 2;
}
The binding let z
binding is only available inside of the lexical block defined by the conditional braces. This is managed within the engine by creating a new lexical environment object for the block with the braces, which is made the 'current' environment when the braces are entered. As a result, when we yield, we need to know which lexical environment to restore. The same is not true of the var y
binding, which would not create a new environment, as the language hoists var
bindings to the top of function definitions.
It's worth noting that environments are mutable in JavaScript, as a direct eval
is allowed to add new bindings:
function* g() {
eval('var b = 10;')
yield b;
}
so we must keep track of the precise environment to be restored, as it may have been mutated.
Resume and .next()
The above essentially covered how generators are created, and how we leave a generator frame. To see how we get back in, we want to look at the implementation of GeneratorNext
, which is self-hosted JavaScript that implements Generator.prototype.next
. Self-hosted JS is a special dialect of JS implemented in SpiderMonkey with elevated permissions that is written specifically for engine functionality.
function GeneratorNext(val) {
// The IsSuspendedGenerator call below is not necessary for correctness.
// It's a performance optimization to check for the common case with a
// single call. It's also inlined in Baseline.
if (!IsSuspendedGenerator(this)) {
if (!IsObject(this) || !IsGeneratorObject(this))
return callFunction(CallGeneratorMethodIfWrapped, this, val, "GeneratorNext");
if (GeneratorObjectIsClosed(this))
return { value: undefined, done: true };
if (GeneratorIsRunning(this))
ThrowTypeError(JSMSG_NESTING_GENERATOR);
}
try {
return resumeGenerator(this, val, "next");
} catch (e) {
if (!GeneratorObjectIsClosed(this))
GeneratorSetClosed(this);
throw e;
}
}
Each function largely does what you would imagine it to do. However, there are some interesting pieces. For example return resumeGenerator(this, val, "next")
gets compiled directly to the Resume
bytecode, not a function call.
The Resume
bytecode calls AbstractGeneratorResume
, which takes the previously saved expression stack, restores it to the machine stack, and sets the program counter to the correct value (as determined by the resume index stored in the generator).
Given the previously discussed Yield
protocol, it also pushes the argument to .next()
, the generator object, and the resume kind.
More Functionality: return
, throw
Generator.prototype
has two more methods than just .next()
: There's also .return
and .throw
.
gen.return(x)
closes the generator, and returns {val: x, done: true}
gen.throw(x)
enters the generator, and throws x
(which may be caught by the body of the generator and handled.
The implementation of these are both in builtin/Generator.js
. Both of which use resumeGenerator
(JSOp::Resume
), just with different types and return values. These are then handled by the CheckResumeKind
op as appropriate.
Conclusion
Having started with no knowledge of how Generators work in SpiderMonkey, I was pleasantly surprised to find the high level design mostly made sense to me. Things I didn't cover here that might be worth exploring:
- The JIT implementation of generators
- Async functions are implemented with generators: How does that work?
Acknowledgements:
Many thanks to Iain Ireland, Steve Fink, and Jeff Walden for proof reading this post. Mistakes are mine, feedback very welcome!