1. Compiling Unlambda
With a simple hand-written LL(1) parser, Unlambda source code can be trivially parsed into AST, which takes the form of a full binary tree, having "apply" operators as internal nodes and primitive functions as leaf nodes.
The following image shows the AST parsed from a
cat program (code:
1.1 Basic Compilation Rules
Execution of an Unlambda program is basically a combination of "eval" and "apply" operations. One of the primary goal of compilation is to eliminate "eval", as AST is a static structure and precedence of "apply" operations can be determined at compile-time. However, behavior of "apply" depends on the function and the argument, whose implementation detail should not interest the compiler, are left to the runtime system.
Within a single pre-order traversal, every internal node of Unlambda AST are compiled to 1 or 2 instructions, by following the rules depending on the types of its children, which are shown below (X, Y are primitive functions):
- We just need to apply function X to Y. Compiles to
APP X, Y. (See #2.2 for instruction set details)
- Before X can be applied to its argument, the right child of this AST node has to be evaluated. Compiles to
NOP(doing nothing), then push
APP X, accto a temporary compile-time stack.
- Before the left child of this AST can be applied to Y, it has to be evaluated to a function. Compiles to
NOP, then push
APP acc, Yto the stack.
- Compiles to
NOP. Both child of this AST has to be evaluated. However, unlike (2), the compiler has no idea what function the left child evaluated to when the right child completes evaluation, so we push
SAto the stack. Whenever an
SAinstruction is popped from the stack, an
LAoperation should be pushed to the stack.
After each (1), continuously pop instructions from the stack and append it to the list of compiled instructions, until one of the following conditions is met:
SAoperation is just popped from the stack.
- The stack is empty (which means the compilation is completed).
d and Promise
As you may have noticed, there's an obvious flaw in the said compilation rules - the special behaviour of function
d wasn't taken into consideration. According to the Unlambda language specs, whenever an "operator" (the left child of an internal AST node) evaluates to function
d, evaluation of the "operand" (its right sibling) is postponed and saved to a "promise", which may evaluate further in the future.
For rule (2), the compiler knows whether X is function
d. When it is
d, every instruction between the
APP d, acc should be skipped, thus it compiles to
DEL offset, where the "offset" points to the instruction next to the
APP d, acc. The offset of the instruction next to
DEL offset should be stored into the promise, so that the skipped instructions can execute in the future. Meanwhile,
APP d, acc will never happen, so we change it into
LA and introduce an internal function
f ("finalize"), which applies the argument to the element popped from the stack, and restores the instruction pointer it saved. The function
f is pushed to the runtime stack whenever the promise is applied to an argument, which should be pushed to the stack beforehand.
For rule (4), the left child may evaluate to
d. If not,
SA behaves normally (saves accumulator), otherwise, it behaves like
DEL, which requires it to take an offset as operand as well.
Rules (1) and (3) are not affected, as there are no evaluations left to postpone, and
d can be applied as a normal function.
The following image shows the compilation of the
cat program mentioned above:
Note that an
APP e, acc instruction is appended to the end of program as a guard to explicitly terminate the program upon end of execution.
1.3 Substituted Application
``sXY function differs from any existing Unlambda functions, for it's not applied to its argument directly. Instead, it performs a "substituted application". When
``sXY is applied to function Z, the evaluation result of
``XZ`YZ is returned. Extra evaluation means extra instructions have to be generated. A special code section for "substituted application" is shown below:
0x00: LA // `XZ -> U 0x01: XCH 0x02: LA // `YZ -> V 0x03: LA // `UV 0x04: LA
``sXY is applied to an argument Z, we successively push function
j, Z, Y and X into the runtime stack, and set the instruction pointer to the first line of the instructions above. Function
j ("jump") is another internal function we introduced, which acts like
i, with the side affect of restoring the instruction pointer it stores. Thus, execution can proceed to the main code section after substituted application is completed.
However, result of
`XZ may be function
d, which postpones the
`YZ evaluation. The task of testing the value of U falls to
XCH. Unlike the cases in #1.2, where the size of postponed instructions varies, here only 1 instruction can be skipped. Meanwhile, promise is created by popping two elements from the stack, rather than storing pointer to the line of instruction.
Another problem is when
`XZ evaluates to another
``sXY function, it will be applied by the
LA instruction at 0x03 (similar to a tail call). If done indefinitely, like in
```sii``sii, the stack will be filled with
j functions storing the address of 0x04 until memory is exhausted. Those
j functions are meant to be consumed by the
LA instruction at 0x04, but unfortunately, it exists outside the loop and never gets executed. TCO can be performed at runtime by checking the value of the instruction pointer when applying
``sXY. If it equals to 0x03, do not push
j into the stack.
Quite a few compile-time optimizations can be employed to improve runtime performance, which are not limited to the following examples.
1.4.1 Constant Folding & Propagation
Certain Unlambda expressions have constant evaluation results, such as
``kXY (always yields X), and
`i`i`ii (evaluates to
i). These expressions can be recognized and evaluated at compile-time.
Another notable case is printing strings. In Unlambda, only one character can be printed per
.x function call. To perform string printing, chaining of the
.x function is required. For example, a program which prints "hello" (
`.o`.l`.l`.e`.hv ) compiles to:
0x00: APP .h, v 0x01: APP .e, acc 0x02: APP .l, acc 0x03: APP .l, acc 0x04: APP .o, acc
By storing "hello" as a constant, and introducing a
p ("print") internal function (which stores the pointer to the string constant and behaves like
i with a side affect of printing the string), the instructions can be optimized to:
0x00: LC<print> 0x00 // Stores function p to accumulator 0x01: APP acc, v
1.4.2 Dead Code Elimination
APP instruction with an immediate
e as first operand:
- If it lies within a
SAblock, all instructions after this instruction and before the instruction referenced by
SAare never reached and can be eliminated.
- Otherwise, all instructions after it can be eliminated.
2. The Unlambda Virtual Machine
The Unlambda VM is a process virtual machine specially designed and optimized for the Unlambda programming language.
2.1 Architecture Overview
The Unlambda VM utilizes an accumulator-based architecture, having the "accumulator" as the general purpose register, while the other three registers being the instruction pointer, the stack pointer, and the "current character". The "current character" is an Unlambda-specific concept, which is used internally by some primitive I/O functions of Unlambda.
The ".text" and ".rodata" sections, which remain read-only at runtime, are loaded from bytecode file when VM initializes. Instructions are located to the ".text" section, and constants to the ".rodata" section. Although there's no such concept as a "constant" in the Unlambda language itself, there are a few compile-time optimization techniques where constants are required.
Dynamic memory allocation is required when
``sXY functions, promises or continuations are created, to store
Y and pointers. An object pool is introduced, due to inefficiency of frequent
free()s of small, fixed-size blocks of memory. Objects in the pool are safely reference counted, for all of them are immutable by design, creating no reference cycles.
The Unlambda VM takes advantage of segmented stacks to lower the overhead of context-switching. During continuation capture and reinstatement, only the "active" stack segment (referenced by the stack pointer) need to be copied/freed, rather than the entire stack. Stack segments are reference counted to employ copy-on-write, and they can be pooled as well to alleviate potential performance degradation caused by the "hot split" issue.
2.2 Instruction Set
An Unlambda VM instruction consists of an opcode followed by 0 to 2 operands. There are a total of 6 different instructions, which are listed below.
(immediate / accumulator)
|Applies the first operand to the second one, and save the result to the accumulator.|
(Load and Apply)
|None||Pop the top element from the stack, and apply it to the value of the accumulator. The result is saved into the accumulator.|
|If the accumulator value is function
If not, Push the value of the accumulator to the stack.
|Create a promise which stores the current instruction pointer, save the promise to the accumulator, and set the instruction pointer to the operand value.|
|Load constant from ".rodata" section starting from the address specified by the operand. This operation requires an "extended opcode" where the purpose of loaded constant is specified. Accumulator value is expected to change after this operation.|
|None||If the accumulator value is function
If not, swap the value between the accumulator and the element next to the stack top.
* Note: The
XCH instruction is reserved for internal use only, and should never appear in bytecode files.
2.3 Bytecode File Format
Currently, the Unlambda bytecode file consists of 4 consecutive sections:
|File header||Metadata of the bytecode file|
|Program header||Metadata of the Unlambda program|
|0x01||1||Major version number|
|0x02||1||Minor version number|
|0x03||1||Size of program header (bytes)|
|0x00||4||Size of ".text" section (bytes)|
|0x04||4||Size of ".rodata" section (bytes)|
2x function or 1x address (32-bit)
|0x01||1||ASCII character 'x' held by function
- Anything before the magic number is omitted (allowing a shebang to be added).
- Integers are of network byte order.
- Anything after the last section is discarded.