Perceptual Noise Substitution

Author: Jürgen Herre
Co-Author: Sascha Dick

Background: Perceptual Noise Substitution

Perceptual Noise Substitution (PNS) [1] [2] is a coding tool for perceptual audio coders that is based on the idea that noise-like signal components cannot be distinguished from each other if they have a similar spectro-temporal envelope. Thus, such signal components can be identified at the encoder side and signaled to the decoder for re-synthesis rather than being transmitted as quantized spectral coefficients. This is carried out individually for each coder frequency band and time frame. PNS processing can save a substantial amount of bitrate, since only the substitution flag and the time/frequency envelope of the substituted signal components have to be transmitted.

The PNS technique is part of MPEG-4 AAC [2] and MPEG-4 High Efficiency AAC (HE-AAC v1 and v2 [3]) codecs. Later versions of this basic idea are implemented e.g. in the Noise Filling tool (xHE-AAC [4]) and in the Intelligent Gap Filling tool [5] allowing a much more flexible mixture of parametric coding and waveform coding within a common framework.

In practice, parametric coding tools generally – and PNS specifically – require a well-designed coding control to ensure that the underlying assumptions of the coding tool are met properly and thus perceivable artifacts are avoided.

Sound Examples

The following demonstration signals illustrate the types of artifacts that can result from improper use of PNS and how these artifacts can be avoided by using proper control mechanisms. They have been generated using an MPEG-4 AAC codec which runs at a bitrate of 128 kbit/s for a stereo signal, i.e. a comparably high bitrate to exclusively showcase PNS-related artifacts (rather than quality degradation due to low bitrate / coarse quantization).

The first set of demonstration signals show how much of the signal spectrum can be substituted by PNS for uncritical signals, i.e. in our case a pop song (“Funky”) and for more critical signals (“Glockenspiel”, “German Male Speech”). Please listen first to the original and the encoded/decoded without PNS as quality references. Then proceed with the versions including PNS for certain start frequencies in descending order and observe the gradually increasing quality degradation.

It is amazing to experience how much of the signal spectrum can be substituted for appropriate music material by synthetic noise without gross degradation in subjective quality! For the given “Funky” signal, this is possible because it does not contain very tonal components. In contrast, the other test signals are critical for PNS processing: “Glockenspiel” contains both very tonal and strongly transient signal characteristics. “Male Speech” consists (in its voiced parts) of clearly separated glottal excitation pulses with a distinct temporal fine structure. Both signals are very different from stationary noise in their nature and thus difficult to be substituted by PNS without producing artifacts.

The next demonstration underlines the importance of proper control of such algorithms. For both “Glockenspiel” and “Male Speech” different processing flavors are shown:

  • Reference encoded/decoded (no PNS)
  • PNS always on (above 4kHz)
  • PNS processing with “tonality check”, i.e. PNS is disabled for tonal signal components
  • PNS processing with “time envelope check”, i.e. PNS is disabled for signal portions with a strong temporal variation of the envelope, such as transients
  • PNS processing with both “tonality check” and “time envelope check”

It can be clearly heard how the two checks prevent artifacts and – when applied together – are a proper control for PNS processing.

In Figure 1, the influence of the different control schemes is illustrated for the glockenspiel by marking spectral parts that have been substituted by PNS in grey. Click on the individual plots to toggle between highlighting of PNS on/off and original spectrum.

a) PNS always enabled above 4kHz b) PNS with tonality check c) PNS with time envelope check d)PNS with time envelope and tonality check
Figure 1: Visualization of different PNS control schemes
Click on spectrogram to toggle: PNS highlighting on/PNS highlighting off/original

References

[1] J. Herre, J., D. Schulz: "Extending the MPEG-4 AAC Codec by Perceptual Noise Substitution", 104th AES Convention, Amsterdam 1998, Preprint 4720
[2] J. Herre, , S. Disch, "Perceptual Audio Coding" In Chellappa, R. and Theodoridis, S. (Eds.), Academic press library in Signal processing Volume 4. Academic press, Elsevier Ltd. Oxford, 2013, pp. 757-799, ISBN 978-0123965011.
[3] J. Herre, M. Dietz: "Standards in a Nutshell: MPEG-4 High-Efficiency AAC Coding", IEEE Signal Processing Magazine, Volume 25, Issue 3, pp 137 - 142, May 2008.
[4] S. Quackenbush. "MPEG unified speech and audio coding." IEEE MultiMedia 20.2 (2013): 72-78.
[5] S. Disch, A. Niedermeier, C.R. Helmrich, C. Neukam, K. Schmidt, R. Geiger, J. Lecomte, F. Ghido, F. Nagel, B. Edler: "Intelligent Gap Filling in Perceptual Transform Coding of Audio", In Audio Engineering Society 141st Convention, Audio Engineering Society: New York, NY, USA, 2016.