Transliterating Old-School BASIC - XP123

Once in a while, I run into a 1980s-era BASIC program, typically written for Commodore 64, Apple ][, or TRS-80. I’d like to transliterate it into a modern language, refactor it, and explore it. These old programs make a nice halfway point between refactoring assembly language and refactoring a modern language.

Old BASICs have several quirks: syntax limitations, undefined behavior, system-dependent access.

We’ll look at a translation approach that makes a starting point for refactoring.

BASIC Syntax

The typical syntax is:

    Line number   Command

e.g.,

    100 PRINT “HELLO”

Line numbers are positive integers, typically ranging 1…9999.

Spaces within a line are optional other than inside strings. (Thanks, FORTRAN!:)

In some BASICs, a colon (:) can be used to put two or more statements on the same line. If the colon follows a THEN clause, those statements are part of the THEN.

Common commands include: assignment, END, FOR, GOSUB, GOTO, IF..THEN, POKE, PRINT, REM (remark, for comments), RETURN

Less common commands include: DATA (defines constant data), DEF (create one-liner functions), DIM (dimension – create arrays), INPUT, ON..GOTO (computed GOTO), READ (from DATA statements), RESTORE (reset DATA statements), STOP (re-startable stop).

Line Numbers – Implementation

To translate line numbers, we can put the whole code in a loop with a switch statement. This is effectively a state machine (see References) with the line representing the state.

  var line = 1
  while line != 0 {
    switch line {
    case 10:  
      // code for line 10
      goto(20)
      break
    case 20: 
      // code for line 20
      goto(30)
      break
    :
    default:
      line += 1
    }
  }

A statement such as IF, GOTO, or GOSUB that references a line number can consult or set the line variable. The END statement can force the line number to 0. Other statements set it as their last step (or fall through).

If a NEXT or GOSUB appears before a colon, the code after the “:” will be a target, and the line needs to be broken apart. In those cases, I’ll give the split-off part a unique negative line number. Negative line numbers aren’t legal BASIC, but this works for our switch statement.

Variables and Assignment

Different BASICs have different rules about variable names. All the old ones support at least single-letter or letter-number case-insensitive variable names (X, B9). Most have string support with $ at the end (A$, which is different from A). Most let you create one-dimensional arrays (using DIM statements).

Some BASICs let you use two alpha characters for variable names. Some (bizarrely) let you use longer names, but only the first two characters are retained, and if you accidentally form a keyword inside the name, it will get used. The OSI manual (see References) gives an example of a variable ANEW that erases the program when it is executed, because that’s what “NEW” does.

Expressions are fairly sophisticated: parentheses, precedence, *, /, +, -, relationals, booleans.

Some BASICs are integer-only, others support floats (and don’t distinguish integers, JavaScript style).

Type and Range Analysis

In some cases, you can deduce types (int or float) or possible ranges of values.

Example: Assume there are no other assignments to X.

10 X = 53520
   :
100 IF X > 53500 THEN X=X-1
110 IF X < 53530 THEN X=X+1

Since the value starts an an integer, and only adds or subtracts integers, we know it’s an integer, not a float.

Because of the if statements, we know X is in the range 53500…53530.

This is mostly useful for memory-mapped I/O – e.g., PEEK and POKE getting access to the screen memory. Knowing a variable’s range may tell us if it is restricted to screen memory or even to a particular line.

FOR Loops in BASIC

The FOR loop is one of BASIC’s most sophisticated features.

The form is

FOR var = start TO end STEP n
  :
NEXT var

where the STEP clause is optional, as is the variable on the NEXT statement. The variable says which loop to close; no variable means close the innermost active loop.

Loops require a stack to track nested loops. To simulate this, we’ll keep a tuple on the stack:

(variable, end, skip, loop-body)

I’m trying this approach (error-handling ignored):

FOR:
  If var is in the stack { -     // restart the loop
    remove var’s entry from the stack
    — some semantics may instead pop inner loops too
  }
  push tuple
  set var = start
  -- Many BASICs execute the loop body at least once 
  -- (even if start > end)
  -- I don't know how the others determine where the loop ends
  goto(loop-body)

NEXT X:
  Pop until variable X is on top of stack   — undefined if not found
  Handle as plain NEXT

NEXT:
  For variable on top of stack:    — undefined if none
    var = var + skip
    if var <= end then { goto(tuple.loop-body) }
    else { pop ; goto(exit-target) }    — the line after the NEXT

If the textual scope of the loop is clear, you can avoid all this and transliterate it as a “regular” for loop.

Occasionally, you run into a loop like this:

    FOR X = 1 TO 50: NEXT

This is a triumph of pragmatics over semantics. The semantics say it means nothing, and can be deleted. The pragmatics say “these BASICs don’t optimize, so this creates a delay.” I’ll translate these into a call to usleep().

Subroutines in BASIC

Old BASICs’ subroutines are not recursive, and they don’t pass arguments, thus they technically do not require a stack.

That said, a stack is still an easy way to implement them. Put the line number of the statement following the GOSUB onto the stack. It may have to be a generated line number if the GOSUB is on a “:” line.

Implementation of GOSUB X, where the following line is Y: (ignores error handling)

  returnTo.push(Y)
  goto( X)

The RETURN statement is complementary:

    goto(returnTo.pop())

Note that RETURN is not matched up textually, but rather found when the RETURN statement is dynamically encountered.

Refactoring Toward Structure: COME-FROM Analysis

Once we’ve transliterated the code, we often want to refactor it to a modern style.

Although originally offered as a joke (see Lawrence in References), it’s useful to analyze COME FROM – where does control transfer from?

For example:

10 GOTO 50
   :
50 X=0
60 GOSUB 500: Y=10
70 GOTO 50
   :
500 Z=10
   : 
550 RETURN

Because line 60 contains a :, we’ll split it. (We’d have to be more careful if it were part of an “IF”.)

10 GOTO 50
   :
50 X=0                      ***   Comes from 10 or 70
60 GOSUB 500
65 Y=10                    ***  Comes from 550
70 GOTO 50
    :
500 Z=10                  *** Comes from 60
    : 
550 RETURN

We get three useful things from this analysis: single-entry blocks of code, single-target blocks of code, and information about subroutines.

Single-Entry Blocks of Code

The code from one “come from” to the next has only one entry point, though it may have multiple exit points.

In the example above, this forms a block:

50 X=0
60 GOSUB 500

and so does this:

65 Y=10
70 GOTO 50

and this:

500 Z=10
  : 
RETURN

Notice that line 65 is a target of the RETURN statement.

We only need to keep the first line number of a block. Thus, our translation can include both statements under “case 50”.

Since one of the refactorings we want to do is to make the code “structured”, eliminating unnecessary line numbers helps us focus on the larger structure.

Single-Target Blocks of Code

Consider this code:

50 GOTO 80
60 line 60 stuff
70 GOTO 100
80 line 80 stuff
90 line 90 stuff
100 etc

If line 80 is the only the target of line 50, we could move lines 80 and 90 up to replace line 50:

50 line 80 stuff
51 line 90 stuff
52 GOTO 100      — was implicit before
60 line 60 stuff     — presumably a target of some GOTO; omit line 70 and fall through
100 etc

That is, a single-target block of code can be pulled up to the one place that goes to it. (Just make sure to handle both explicit targets and the fall-through between statements.)

Subroutine Information

We get information about the callers of a subroutine. If there’s only one caller, we can inline it by changing GOSUB to GOTO, and changing RETURN to GOTO the statement following the GOSUB.

That inlining may open up more code rearrangement possibilities.

System-Specific Code

A lot of old code uses PEEK and POKE to get at system-dependent aspects: screen, keyboard, graphics modes, occasional assembly-language methods. I’ll explore this in a future article.

Conclusion

Old BASICs aren’t pretty, but it’s fun to translate and modernize some of the old code.

References

Clark, R. Lawrence. “A Linguistic Contribution to GOTO-less Programming”, CACM, April, 1984, pp. 349-350.

Ohio Scientific, Inc. The 8K BASIC-in-ROM Reference Manual. 1978.

Wake, Bill. “State Machines” (parts 1-4). Retrieved 2021-07-11.