Blind Upmix for Applause-Like Signals Based on Perceptual Plausibility Criteria

A. Adami, L. Brand, S. Disch and J. Herre

Abstract

Applause is the result of many individuals rhythmically clapping their hands. Applause recordings exhibit a certain temporal, timbral and spatial structure: claps originating from a distinct direction (i.e, from a particular person) usually have a similar timbre and occur in a quasi-periodic repetition. Traditional upmix approaches for blind mono-to-stereo upmix do not consider these properties and may therefore produce an output with suboptimal perceptual quality to be attributed to a lack of plausibility. In this paper, we propose a blind upmixing approach of applause-like signals which aims at preserving the natural structure of applause signals by incorporating periodicity and timbral similarity of claps into the upmix process and therefore supporting plausibility of the artificially generated spatial scene [1].

Applause Separation Demo

applauseDecomposition


The applause decomposition proposed in this paper is a modified version based on the approaches used in [2,3]. The Figure depicts a block diagram describing the basic structure of the applause decomposition processing. Within the energy extraction stage, an instantaneous energy estimate as well as an average energy estimate is derived from the input applause signal and subsequently, the ratio of both is computed. This ratio is gated/thresholded resulting in a separation gain which is applied to the input applause signal and yielding a foreground signal containing individually perceivable foreground claps and a background signal containing the more noise-like background. Below, some sound examples of the decomposition are presented.

2 people

Mix_2_time

4 people

Mix_4_time

8 people

Mix_8_time

16 people

Mix_16_time

32 people

Mix_32_time

64 people

Mix_64_time

128 people

Mix_128_time


Blind Upmix Demo

applauseUpmix


The upmix of the foreground signals was based on perceptual plausibility criteria, meaning the upmix exploited the assumptions that claps originating from a particular person exhibit

  • similar spectral envelopes and
  • some form of temporal periodicity.

The Background signal was decorrelated using a modified version of the method proposed in [4] to yield a wider stereo background. Below, some sound examples of the resulting blind upmixed signals are presented.

2 people

4 people

8 people

16 people

32 people

64 people

128 people

References

  1. Adami, A. and Brand, L. and Disch, S. and Herre, J., "Blind Upmix for Applause-Like Signals Based on Perceptual Plausibility Criteria", In Proceedings of the 20th International Conference on Digital Audio Effects (DAFx-17), pages 496–501, Edinburgh, UK, 2017.
  2. Adami, A. and Herre, J., "Perception and Measurement of Applause Characteristics: Wahrnehmung und Messung von Applauseigenschaften", In Proceedings of the 29th Tonmeistertagung (TMT29), pages 199-206, Cologne, Germany, 2016
  3. Adami, A. and Brand, L. and Herre, J., "Investigations Towards Plausible Blind Upmixing of Applause Signals", In 142nd International Convention of the AES, Berlin, Germany, 2017
  4. Hotho, G. and van de Par, S. and Breebaart, J., "Multichannel Coding of Applause Signals", EURASIP Journal on Advances in Signal Processing, vol. 2008, 2008