Let’s look at the threaded code approach. It’s a less-well-known way to build an interpreter for a programming language. (The more common approach simulates each instruction.) Threaded code has a high-level analog, and some extra tricks when done in assembly language.
Core Idea
A threaded-code interpreter can build and assemble functional pipelines. Although it’s most often used for programming languages, consider an image transformation tool.
We have primaries that are the basic image transforms: scale, rotate, change contrast, etc. We may also build secondaries out of primaries and other secondaries. For example, when I post a video on YouTube, they let me add a thumbnail image. To create it, I take a screenshot then send that through a script that scales, increases contrast, and reduces the jpeg quality. This saves me from opening Gimp and making those changes manually.
To use a threaded-code style approach, build a Composite-like structure where primaries are the leaves, and secondaries are a compound object listing other primaries or secondaries.
In Swift, I’d use a recursive enum; for Java, C#, etc., I’d use subclasses. It would look something like this:
indirect enum Transforms {
case primary1(its control parameters)
case primary2
case primaryN
case secondary([Transforms])
}
We’d then create a method that runs primaries directly, and handles secondaries by walking through list of Transforms and applying them in order. The result of the final transform becomes the result of the secondary.
Aside: Programming Languages
We don’t always think of programming this way, but a step in a program can be looked at as a transform:
Because we know programming languages are so general, we should expect that the tree-style approach can be used in many situations, including images (2d and 3d), sound editing, text styles, text transformation, build pipelines, SQL queries, data preparation, search, and more.
Threaded Code: The Assembly Language Twist
[You’ll be forgiven if you skip this section:)]
I’ve been aware of the threaded-code technique for a while, but I’ve never implemented it in assembly language. Lately, I’ve been doing that: implementing a Forth-like programming language. (See the Bibliography.) Note: I haven’t yet applied the twist (optimization) I’ve described below, but it’s in my plans:)
Setup
To treat primaries and secondaries uniformly, we’re going to give each a block with a one-word header, containing the address of the routine that should interpret it. The address of the header is called the word address (since this programming language defines “words”).
For primaries, the header points to the primary’s code:
All secondaries will point to a single routine I’ve called start2d
. Start2d
knows how to walk through the list of addresses in its block.
Code
Let’s look at the code for DUP
(duplicate the top of the data stack), start2d
, and end2d
.
DUP:
dup:
DATA_TOP x0 // Load top of data stack to x0
DATA_PUSH x0 // Push contents of x0 onto data stack
ret // Return
This code looks at the top of the stack, then pushes a copy and returns. Its header lives elsewhere.
Start2d:
start2d:
str lr, [sp, #-16]! // push LR and...
str x20, [sp, #8] // VPC to system stack
add x20, x0, #8 // start VPC just past start2d call
L_interpreter_loop:
ldr x0, [x20], #8 // Load word address, increment VPC
ldr x1, [x0] // Load address of code
blr x1 // Call method
b L_interpreter_loop // Repeat
The first two str
(store) instructions save call information to the system stack. The add
instruction sets X20, the VPC (virtual program counter), which keeps track of which row we’re on.
Then we hit the main loop. The first ldr
(load from register) instruction reads the row, the word address (pointer) to the header. The second ldr
then reads the header block. Then blr
(branch linked using a register) calls the routine the register points to. When that routine returns, b
(branch) takes us to the top of the loop.
End2d:
end2d:
ldr x20, [sp, #8] // Restore VPC and...
ldr lr, [sp], #16 // LR from system stack
ret // Return to start2d's caller
We exit the interpreter’s loop by running this routine. It first restores the information we pushed on the stack at the start. The key is that we now have the lr
(link register) that was in effect when we first called start2d
. When we execute ret
, we return to the original caller.
The way we get here is a little convoluted. It works because we require every secondary to have an end2d
at the end of its list.
A Trace
Consider the code sequence DUP 1 SUB
. This duplicates the top item, pushes 1, and subtracts; thus it’s an idiom for “push a-1 on the stack
“, where a
is the value at the top.
If you trace what’s happening (key moments only), you’ll see this:
:
L_interpreter_loop:
:
blr // call DUP
:
ret // from DUP
b L_interpreter_loop
L_interpreter_loop:
:
blr // call "1" routine
:
ret // from "1"
b L_interpreter_loop
L_interpreter_loop:
:
blr // call SUB
:
ret // from SUB
b L_interpreter_loop
L_interpreter_loop
:
blr // call end2d
:
ret // from start2d
This dynamic view shows a recurring pattern:
ret
b L_interpreter_loop
This pattern isn’t as obvious statically.
The Twist
The unexpected twist is that we can eliminate procedure calls from our core loop.
The ret
(return) at the end of a primary always jumps to the same place, to the "b L_interpreter_loop
” instruction. So, let’s make the last line of each primary be “b L_interpreter_loop
“. This cuts out the return instruction and an extra jump.
Since we now don’t want primaries to return, we shouldn’t use blr
in start2d
, as it’s for calling a routine. Instead, we can jump (br
) where the register says. This might be a little faster but is certainly less work.
This work is in the critical path of the start2d
routine, so it will save time for every secondary we use.
What Have We Lost?
I find this approach interesting because it takes us back to a world without routines in the usual sense. Most programming languages use them implicitly, whether they call them subroutines, procedures, functions, or methods.
Normal call-return is only one alternative; the threaded-code approach uses indirect branches instead.
Another alternative model is coroutines. Instead of starting at the start of the routine every time, a coroutine picks up where it last left off. C# and Kotlin both have such an ability, but I’ve only seen people use coroutines to build generators (returning the next value in a sequence according to some systematic enumeration). I haven’t yet seen them used as true mutual coroutines. I’d like to explore this further.
Other alternative models are possible: Lisp, Prolog, and static functional languages clearly have a somewhat different model in mind, but they’re shoehorned into the procedural model.
Conclusion
We explored the idea of threaded code, whether implemented as a tree structure in a high-level language, or via indirect jumps in assembly language. We looked at an optimization twist for the assembly implementation, and briefly considered some unusual alternatives to the procedure model.
Next time you have a situation where you have transformations and groups of transformations, try the threaded-code approach and see how it fits!
Bibliography
“Assembler“, YouTube playlist. Retrieved 2024-03-03. These are videos from the sessions where we’re exploring some of these ideas.
“Threaded code“, Wikipedia. Retrieved 2024-03-03.