The secret to a great sounding dialog or vocal mix is a good recording: plosive free, without mouth noise, sans funky contributions from the room, and lacking sibilance. I suppose an inspired, well performed recording helps a little too. But let’s assume we have a decent voice recording and it’s time to mix/process toward the final audio program.
The way I see it, good voice processing has similarities across media: music, film and TV, game audio, broadcasting/podcasting, theme park shows, talking toys, etc. Typically we want to understand the words, bring out the best qualities of the voice, and optimize for the technical limitations of the playback situation. To that end I have a typical voice processing chain. I don’t see it as a recipe to follow regardless of the context, but rather a starting point to tailor the sound for each unique production.
My first step is probably the most important and definitely the most likely for me to implement: Frequency Cuts. Philosophically, I think of this as simplifying the signal closer to its essence. For voice there is usually low frequency information that gets recorded but we can’t hear. It is good to filter out that stuff. When directional microphones are used up close, proximity effect may be audible. A shallow low rolloff might help, as previously discussed. When the presentation will have a limited high frequency response, I like to use a high roll off too. If there are “ugly” frequencies in the recording, I’ll use a parametric to find and minimize those. Anything that doesn’t benefit the end result may be minimized using equalization.
Other cuts may involve spectrum that sounds fine for the voice by itself, but that won’t be important in the larger mix of elements. This isn’t a significant issue for voice only or voice dominated projects. But for song vocals, sonically busy scenes, and other source rich environments, additional cuts in the voice can free up spectrum for other sounds. These may not be obvious at the very beginning of the mix, but instead tend to be uncovered over time. While contextual cuts are less obvious, they can be just as helpful toward a great sounding result.
In some cases a voice may be heavily modulated. Cartoons, monsters, aliens, robots, and many other kinds of voices are severely processed. In these cases, it is important to consider what’s being done to the voice before applying low/high cut filters or parametric EQ. For example, if a varispeed up will be applied (faster AND higher in pitch) a low rolloff may be important to prevent subsonic frequencies from becoming audible. And it may not be obvious what frequency or steepness a high pass filter should take until hearing the voice post-varispeed. So there may be an iterative process of cutting, mangling, revisiting the cut, revisiting the mangle, etc. There may also be cuts after the voice is morphed. As with all things audio, let your ears inform your decisions.
Cutting frequencies isn’t intuitive. No, the obvious approach is boost what we want to hear. So why bother cutting? And why make cuts the first step? If there are other processers applied to the voice, I don’t want any of them trying to cope with spectrum we can’t hear, or don’t want to hear. A high pass filter and potentially any other useful equalization cuts seem to work best when applied up front. I try to be judicious here; nothing too crazy. But I am always rewarded for focusing the voice on the parts we want to hear by removing and minimizing the parts that we don’t.
Also in this Voice Processing series: