Interface Seams: Sound and Computers

There are several approaches to computers and sound.

Direct Generation

The computer can be used to directly generate sound. One approach is to put a transistor radio next to the computer. The switching of the transistors causes sharp transitions, which generates radio waves. (This approach was used about 25 years ago; perhaps modern computers have too high a frequency.) Purportedly, you could detect loop patterns by listening.

Another approach was to "program" the line printer. By striking different patterns of letters, you could get different (if tinny) pitches. By judicious printing, you can play tunes.

Computer Control

In the 1980s, many personal computers included "sound chips" that could be programmed. The computer would talk to it as a peripheral, set up parameters for pitch, amplitude, attack, decay, filtering, etc. This was particularly used for sound effects in games.

MIDI control is similar. The computer may act as a mediator between a MIDI keyboard and sound system, perhaps storing patches as well. (MIDI is a standard for synthesizer keyboards.) The computer may act as an intelligent intermediary, modifying what happens as well.

Digital Recording and Playback

You can put together an analog-to-digital converter with the computer storage, feeding a digital-to-analog converter. This technique was used for such things as talking alarm clocks, where a fairly small set of words could serve for everything.

Digital Generation and Manipulation

Once you’ve got the ability to play back a digital signal, you can let the computer generate the sounds. (A very early system, the MUSIC programs of the 1960s, used this approach.) It’s fairly expensive: for a full range of pitches, you must generate up to 40,000 16-bit stereo samples for each second of generated sound. The MUSIC system defined a paradigm of oscillators, filters, etc. hooked up in a virtual patch panel.

Sound manipulation can use this same mechanism. For example, you might get an echo by taking a sound, delaying it a fraction of a second, and mixing in the delayed sound at reduced volume.

Speech

Speech generation is not perfect by any stretch. There have been various sound models proposed and tried. Speech output requires addressing more than just "pure sound" – it must take pronunciation, dynamics, intonation, and other factors into account. (Speech recognition is in even worse shape – it’s still very expensive but improving each year.)