January 2, 2015 / Randy Coppinger

More Machine Voices

In our previous discussion about Machine Voices we looked at sonic treatments to make a voice recording sound more like an automaton: re-recording, frequency, time, vocoding, speech synthesis, intentional misuse of tools, and layering in Part 1.

Before we get into more treatments it is worth noting that sonic effects alone are not the only factor in making human speech sound like an android voice. When a synthetic intelligence is the goal script writing and voice acting can help give us robotic clues. For example, HAL 9000 from Kubric’s 2001: A Space Odyssey speaks without emotion. It’s creepy that the computer has no feelings, evidenced by the lack of word stress or pitch variations that humans naturally use. The classic Robot B-9 from Lost in Space used a monotone delivery to let us know it’s “Non-Theorizing” status. Many androids have been portrayed by voice actors using a monotone delivery, though pitch variations have also been removed from regularly entoned speech using processing (example: the Cylon Centurions from the late 1970s)

Pace and pause can be used to sound intentionally procedural. C3PO has quirky pauses and a steady pattern to his dialog that is part of the android speech presentation. Concatenated speech that we hear when calling for the local time, or via MovieFone demonstrates how it sounds when pre-recorded voice is presented piecemeal, organized based on current conditions by a computer for the listener. Interactive Voice Response is used by airlines, banks, and tech support so callers can get what they need with little or no “live human” time on the line. We can intentionally emulate these unnatural patterns before any treatment is applied.

Editing may also play a role. For example, the mechanical personality Max Headroom stuttered, like a series of bad edits, to let us know the voice was from a machine.

Oscilloscope readout of an amplifier output. This shows a 1 KHz sine wave clipping into 5 ohms. 10V/division.One of the sonic give aways that we are listening to a machine is some kind of failure. Subtle to extreme distortion can help convince listeners that electronics, transducers, and the power supplies found in a machine are being used for the reproduction of the voice we are treating. Sometimes the best part of re-amping is pushing the system into distortion — a little or a lot. Of course this can be simulated with software, or by passing signal through a piece of gear like a guitar effects pedal. Sometimes massive distortion mixed back in subtly with the untreated signal helps prove the idea while maintaining intelligibility.

Equipment failures don’t have to be limited to electronic or mechanical clip distortion. There are a wealth of opportunities to glitch a voice recording and make it sound less than human. Low resolution digitization — such as 8bit, 8k Hz — can give you some downright awful sounds. Hint: a lot of talking toys operate down in this range. You can downres using all kinds of different software, not just plugins. Or try hardware such as a guitar stomp box featuring bit crushing effects.

@stonevoiceovers suggested using the ProTools Air Frequency Shifter. Early pitch processing such as the Eventide H910 Harmonizer could sound garbled and glitchy, especially when pushed to extremes. Most pitch processing still sounds pretty synthetic with excessive settings. One idea is to simply pitch something way up or down and then process again to return to normal pitch. The filters and shuffling will add some great artifacts. Of course you can keep the pitch changes, even variable pitch change, to make something both simulated and glitchy at the same time.

You don’t have to pitch process to get chopped, stuttered sounds. There are plugins and hardware effects processors dedicated to these kinds of glitch effects. We can even get these kinds of sounds by abusing a simple tremelo or vibrato processor. And if you’re not afraid to experiment, try circuit bending an inexpensive consumer product that records and plays sound. Cheap toys can be especially fun to hack.

Some of my favorite plugins make processing easier by having several techniques readily available on a single display. For mediated voice futzes it hard to beat Speakerphone by Audio Ease. It has tons of great re-recording techniques simulated using convolution — both speaker and microphone emulations are included. Then you can pile on frequency manipulations with EQ, overdrive, room reflections, telecom codec simulations, and much more.

Whereas Speakerphone is a quick, accurate path for emulating the real world, FutzBox offers a palate of sonic changing parameters to play around and create your own flavor. Start with a speaker emulation, then select options to downres, filter, overdrive, even add a noise layer. The interface makes it fun to experiment with different combinations.

I’ve worked in several studios that had a rackmount Eventide H3000 Harmonizer. If you ever have the opportunity to play with this kind of quirky box full of crazy treatments, indulge your ears with synthetic weirdness. A significant number of robot voices have been created using Harmonizers over the years. Eventide makes plugins these days, which is a more convenient way for everyone to access multi-effect sounds.

Ben Burtt is a living legend. His creative use of sound tools inspires us. From classic ARP synthesizers to the Kyma, he blurred the lines between human voice and machine with iconic robots from R2-D2 to Wall-E. We don’t have to use someone else’s plugin; we can devise our own treatment paths too. Techniques like vocoding and speech synthesis demonstrate a confluence of artistic and technical thinking that ask us to create new processes. But powerful tools with no specific voice processing agenda require patience to wield well, and a steep learning curve may not coalesce with an inspired moment. These are deep waters. Come mentally prepared.

Native Instruments makes powerful music oriented software like Kontakt for sampling, and Reaktor for synthesis. To write your own code consider Pure Data and Max/MSP. More tool recommendations: 10 Great Tools For Sound Designers, What’s The Deal With Procedural Game Audio, and Google search for new ideas in sonic tools, including audio related discussion groups.

(1) Levels. Metallic resonances and other synthetic treatments can generate crazy level spikes. Watch your input and output levels so you don’t produce unwanted clipping.
(2) Dynamics. Just because your dialog was compressed and/or limited before you treated it, doesn’t mean you can ignore dynamics after. Consider another pass through dynamics processing so your low level sounds don’t get lost, and the newly created signal peaks don’t prevent you from setting this dialog loudness on par with everything else.
(3) Diction. When you mangle in the 2k – 5k Hz range the intelligibility and presence of the dialog may become diminished. Our ear/brain system uses the 6k – 8k Hz range to distinguish S sounds from F sounds, with potential confusion for other sounds like TH, SH, or CH. Sometimes you just need EQ to enhance these areas. Other times you may need to blend in some unprocessed or less processed voice in these frequency ranges to recover the diction that was lost from treatment. If you lower resolution, remember Nyquist showed that we need at least twice the sampling rate to represent an audible frequency, meaning a downres to 8k Hz constrains the audible frequency to only 4k Hz!
(4) Distortion. If you choose overdrive as part of your processing chain, keep in mind that distortion is a form of dynamic range compression. If you’re unsure whether to use it or not, remember that distortion can be a substitute or compliment to any dynamics processing needed after applying other techniques. Even a little intentional clipping could help improve the signal chain — from a simple futz, to a full-on sentient machine.

If you’ve got any tips, tricks, or other suggestions please share. Have fun making Machine Voices!

