This webpage has supplementary information for the paper submitted at ICASSP 2023 titled
'Towards
Controllable Audio Texture Morphing'

A GAN consists of two neural networks, a generator G and a discriminator D trained in opposition to
one another. The model samples a random latent
vector Z from a spherical Gaussian, appends a conditional vector to it, with the goal of achieving
control of the conditional parameters independent of the Z vector. This input vector is run
through G which is a stack of transposed convolutions to upsample and generate output data
X_{fake} = G(z), which is fed into D that consists of downsampling convolutions, mirroring
the
architecture of G, to estimate a divergence measure called Wasserstein distance between the real
X_{real} and generated distributions [1]. To encourage the generator to
use the conditional information, an *auxiliary classification* (AC-criterion) loss is added
to the discriminator that learns to predict the conditional vector [2].

We consider two types of conditional parameters - control parameters P and class parameters C. We define*control parameters* as the parameters that are intended to be controlled within an
audio texture class. For example, *strength* is a control parameter for the audio texture
class wind. For 11 possible values of strength, P for one-hot GAN will be 11 dimensional and for
MorphGAN will be 1 dimensional. For a
two-class experiment, C for One-Hot GAN will be 2 dimensional and MorphGAN will be 3 dimensional.
We use the phase gradient heap estimation (PGHI) representation as shown in Gupta et
al.[3].

We consider two types of conditional parameters - control parameters P and class parameters C. We define

[1] Arjovsky, M., Chintala, S., & Bottou, L. (2017, July). Wasserstein generative adversarial
networks. In International conference on machine learning (pp. 214-223). PMLR.

[2] Odena, A., Olah, C., & Shlens, J. (2017, July). Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning (pp. 2642-2651). PMLR.

[3] Chitralekha Gupta, Purnima Kamath, and Lonce Wyse, “Signal representations for synthesizing audio textures with generative adversarial networks,” in Sound and Music Computing (SMC), 2021.

[2] Odena, A., Olah, C., & Shlens, J. (2017, July). Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning (pp. 2642-2651). PMLR.

[3] Chitralekha Gupta, Purnima Kamath, and Lonce Wyse, “Signal representations for synthesizing audio textures with generative adversarial networks,” in Sound and Music Computing (SMC), 2021.