Stitching

Driving out to the Sheremetyevo airport in Moscow on a weekday can be quite the undertaking between traffic jams, and packed shuttle trains and buses. It takes longer to get there than any other part of the city. With no luck meeting earlier, extraordinary measures became necessary. The Moscow — Cologne flight is set to take the Levsha team to Gamescom 2019 in a few hours, and to make some time to touch base before the flight, we’ve agreed to meet at the very edge of departure from Russia into greater Europe.

— Listen a moment: “Attention all passengers of Flight #75 to Athens, please proceed to Gate 11 for boarding” — that’s exactly what we’re here to discuss. This is audio stitching in its purest form, put to work long ago.

It’s a little difficult to make the announcement out, just like every other. An overcomplicated script, drowned out by ambient noise, does not convey information well. By accident, we happened to have agreed to meet up exactly where the topic of our conversation could be heard quite loudly. What the Levsha team is here to discuss is the aforementioned audio stitching, a method of audio recording and production often used in video game localization.

— Does it have something to do with sewing?

— Well, the term certainly does come from the idea of a stitch. The process really resembles the preparation of swatches of cloth to be sewn together into a quilt. What you hear at the airport is a type of concatenative speech synthesis. It sounds very clever, and almost nobody actually understands how it works. Things are much less complicated with localization, so any set of phrases and the combinations following from it are referred to as audio stitching. You can record a handful of words, which will be combined together or into other phrases, and that helps save a lot of time while reining in the voiceover budget.

— Do translators prepare the ‘stitches’?

— It depends. The translator is a universal soldier, but we can’t always count on them having enough experience for this kind of work: the process is a rare one, I can only recall it coming up for a few large projects, one of which continues to see success in the market to this day. More often than not, you have to do the prep for the recording materials yourself, but there are some teams out there that can be trusted with the work.

— What’s so difficult about audio stitching, that not every translator can do it?

— Like I already said, it’s rare to have demand for this kind of audio material. Levsha’s first encounter with audio stitching was five years ago. What it was, how it worked, and how it sounded in-game was hard to understand at first. If you’re struggling to figure a thing out yourself, trying to explain it to a newbie is just going to waste of time and get on everyone’s nerves. So I had to sit down and spend some time packing my brain full of pseudoscientific data, the client’s demands, and other details until my head started to swell.

— If that was five years ago and you’re still doing this type of work, does that mean you successfully absorbed all that info?

— It only became completely clear when I heard it out in the world. The day after receiving the client’s request, I went home by way of Paveletsky station and heard a similar phrase to the one you heard so recently. It sounded unnatural, with pauses between the words, and both the tone and volume bouncing around. I got the impression that it… had been sewn together. Some shawarma dropped by my feet, and I took it as a sign: stitching was meant to be.

— Is it tough to work with?

— Preparing the text doesn’t take anything too hardcore. The first year was pretty wild: we’d get a request to select a suitable phrase using a fricative, for example. Only a deep dig would shed light on exactly what that meant. Audio recording would later be simplified for Russian, but with French and Spanish, the process is still pretty tangled. So, the main thing with text preparation is to clearly envision how a phrase will sound. You can’t work in silence, you have to record and read the text aloud.

— Could you give the grade-school explanation? A few examples wouldn’t hurt.

— Speaking simply, a fricative is a loud sound. Voiced or unvoiced, they’re produced with an almost complete closed mouth. S, SH, V are all examples of fricatives. Whenever you make a fricative sound, the digital representation is a large soundwave, making it easier to put together phrases. If we’re looking for an example… Sports works for me. There are a lot of matches on the schedule for a hockey season. Each team in the NHL has 82 matches. Now imagine what it might look like if we render out the phrases for matchups between all 31 teams. Using stitching and good ‘quilt’ design, the method of substitution without fricatives allows us to make do with only three phrases:

“Another NHL game day…”

“Team A is playing…”

“Team B today.”

— How do the VO actors feel about it?

— Fine now, they’ve gotten used to it. During the first year, the audio director had to explain what to say and how to say it, sometimes pausing recording, since tone becomes identical at high speed and that can be a problem for this type of session. Different moments in a game are going to need a different tone, but the actors are recording one or two words at a time. It wasn’t easy, but everyone has gotten used to it, and now the audio stitching stays in line with the timing, may Rosenthal pardon my indiscretions.

— If you figured out how to handle audio stitching, why is it so rough in the airport’s announcements?

— I’m not sure. The airport is a complicated situation. Even if you understand something and know how to do it well, it can be difficult to be heard in such a sprawling system. Sometimes people think that if something already works, there’s no need to improve on it. It is, of course, hard to agree with that view. Ilya Birman has written a great post on the scripting of informational announcements at airports on his blog. The information is important, but for some reason people tend to add extra words, which overload the phrase. Our audio stitching doesn’t suffer from this, because the whole idea is to make it easier to convey the information. Oftentimes, there’s only one word in a line.

Another burst of words sewn together into a soundwave rings out across the concourse. And it does sound quite bad. It’s hard to make sense of it right away. In contrast, good audio stitching isn’t even noticeable in a video game, and should only be known to the management and the recording studio. Perhaps one day these sprawling, complex systems will let in people eager to make their work easy to understand for others, from the get go. But for now, a huge line snakes its way over to passport control, with Levsha somewhere in it, on their way to Gamescom.