Tandem Coding

Author: Jürgen Herre

Background information: Coding Distortion

In each standard perceptual codec, the spectral representation of the input signal is altered slightly by the quantization process. The lower the bitrate, the coarser quantization is required to represent the signal within this given target bitrate. In this way, distortion is introduced which can be modeled as additive quantization noise ("coding noise") which is shaped according to perceptual criteria, as estimated by the perceptual model.

With increasing deployment of low bitrate audio coding, use of audio compression can happen at various stages (contribution to production/studio, distribution between production facilities, emission/transmission of content to consumers, etc.). Since the majority of audio processing and transmission operations are still performed using either uncompressed PCM or analog representations, multiple cycles of decoding, processing, and re-encoding of the audio content often occur. Similarly, a change of audio coding formats and/or bitrates usually requires a decoding/re-encoding cycle for format conversion. Quantization noise is generally added in each cycle, accumulating with each generation and leading to a progressive drop in audio quality. This unfavorable situation is described using various terminology, such as:

  • Tandem Coding
  • Multiple Generations Coding
  • Repeated Encoding/Decoding.

In 1993, the ITU-R (formerly CCIR) carried out extensive testing of these effects for MPEG-1 Audio codecs [4], noting marked decrease in audio quality by tandem coding. Similar results are reported in [8] and correspond well to results of perceptual measurements for such constellations [5] [6].

Sound examples

The following sound excerpts illustrate the gradual degradation of the repeatedly encoded/decoded audio:

Play Funky
Original stereo sound excerpt
Play Funky 1
Encoded/decoded, generation #1
Play Funky 2
Encoded/decoded, generation #2
Play Funky 3
Encoded/decoded, generation #3
Play Funky 4
Encoded/decoded, generation #4
Play Funky 5
Encoded/decoded, generation #5
Play Funky 6
Encoded/decoded, generation #6
Play Funky 7
Encoded/decoded, generation #7
Play Funky 8
Encoded/decoded, generation #8
Play Funky 9
Encoded/decoded, generation #9
Play Funky 10
Encoded/decoded, generation #10

The above sound examples were generated by repeated use of a compression scheme at a bitrate which provides good quality encoding after one generation, but does not leave substantial safety headroom to be used by further coding generations. As can be observed, the accumulated distortion quickly becomes audible (i.e. exceeds the masking threshold) and becomes more and more objectionable as the generation number increases.

How to Avoid Quality Loss by Tandem Coding

Clearly, maintaining a high audio quality requires avoiding the adverse effects of tandem coding. Two main rules apply as general recommendations:

Avoid decoding/re-encoding coding wherever possible

The most effective way to avoid quantization error accumulation is to stay within the coded data format as long as possible. Thus, no further quantization processes introducing additional quantization noise are carried out. In fact, there is often no reason for leaving the coded domain and going via PCM, e.g. for copying purposes. Even when different algorithms are involved, "transcoding" (i.e. the conversion in the coded domain) can probably improve the outcome of tandem coding. If, however, further processing of the signal is required such as level change, equalization or reverberation, a return to the PCM domain is often required. Unfortunately, provisions for interfacing in the coded domain are not yet widely available today.

Provide quality headroom

If decoding/re-encoding of the compressed audio content is necessary, it must be clear that a degradation in signal quality will happen. Thus, in order for the final coded audio to meet a desired target quality, the coding quality at intermediate coding steps must be significantly better than the target quality. In this way, quality losses due to tandem coding can be compensated by increased coding quality (and required bitrate) in intermediate coding steps.

Further ideas for alleviating tandeming loss may help in certain application specific scenarios:

  • Use digital interfaces and synchronization marks between subsequent decoding/re-encoding steps in order to ensure that their encoder framing coincides [5].
  • In a digital production studio environment, many repeated steps of encoding and decoding can occur due to the fact that the common infrastructure only supports the exchange of audio data as uncompressed audio (e.g. via AES/EBU connections) [2]. To avoid accumulation of signal distortion by tandem coding, it was proposed to attach some type of "helper information" to the decoded audio signal, called a MOLE data [2]. Consisting mainly of synchronization marks and quantizer side information, this MOLE information enables a perfect translation back to the original bitstream representation and in this way avoids any distortion accumulation (assuming that there is no processing of the signal in the uncompressed domain, i.e. between the decoding and encoding steps). Based on this approach, an AES standardization project [9] targets on the necessary harmonization for carrying such MOLE data over an AES/EBU audio interface. Further considerations on the issue of embedding were presented in [7].
  • Use of a so-called Inverse Decoder [3] can translate a decoded audio signal back into its bitstream representation and thus effectively provide the same benefits as the MOLE-assisted re-encoding, but without the need for an additional side information channel (i.e. the MOLE data). By using the inverse decoder as a MOLE-assisted encoder, the quality of the audio signal will not degrade as long as signal processing (such as filtering or mixing) is avoided.
  • Use audio codecs that have been specifically designed for use in tandem configurations. For example, the codec described in [1] employs algorithmic features allowing several common signal processing operations to occur directly on the compressed signal, hence effectively reducing the number of encode/decode generations. Routine operations such as crossfade editing and level adjustment are possible without decoding to PCM, and the compressed audio bitstream can be directly transported on standard AES/EBU connections. In accordance with the recommendation above, these codecs employ higher bit-rates to provide extra quality headroom for those cases when decoding to PCM is unavoidable.

The provided example sound files demonstrate the original applause recording as well as three sound examples with increasing deficiencies in stereo imaging quality, as would be produced by a coder without a proper control of the intensity stereo coding mechanism. Please observe the increasing loss of people applauding in the outer left and outer right seats as well as the overall lack of spatial impression and distinct reproduction of the single clap events.

References

[1] L. D. Fielder, S. B. Lyman, S. Vernon, and C. C. Todd, "Professional audio coder optimized for use with video" presented at the 107th Convention of the Audio Engineering Society, New York 1999, preprint 5033.
[2] John Fletcher, "ISO/MPEG Layer 2 - Optimum Re-Encoding of Decoded Audio Using a MOLE Signal," presented at the 104th AES Convention, Amsterdam 1998, preprint 4706.
[3] J. Herre, M. Schug, "Analysis of Decompressed Audio - The Inverse Decoder", presented at the 109th AES Convention, Los Angeles 2000, preprint 5256.
[4] CCIR Doc. 10/52: Draft New Recommendation on Low Bitrate Audio Coding, Doc. Radiocommunications Study Group, 1993.
[5] M. Keyhl, J. Herre, C. Schmidmer, "NMR Measurements on Multiple Generations Audio Coding," presented at the 96th AES Convention, Amsterdam,1994, preprint 3803
[6] M. Keyhl, C. Schmidmer, J. Herre, J. Hilpert, "Maintaining Sound Quality - Experiences and Constraints of Perceptual Measurements in Today's and Future Networks," presented at the 98th AES Convention, Paris 1995, preprint 3946.
[7] Frank Kurth, "An Audio Codec for Multiple Compression Without Loss of Perceptual Quality," Proc. of the 17th International AES Conference on High Quality Audio Coding, Florence 1999.
[8] S. Ritscher, U. Felderhoff, "Cascading of Different Audio Codecs," presented at the100th AES Convention, Copenhagen 1996, preprint 4174.
[9] AES41-2000 AES standard for digital audio -- Recording data set audio bit-rate reduction