Signal Representations For Audio Texture Synthesis Using Generative Adversarial Networks

 
 
 
 
 
Our code is available on Github here
 
 
 

Simulating estimation error of IF

 
 
IF method is robust to estimation errors in temporally stable sounds with clearly separated frequency components, but is sensitive to estimation errors in short duration noisy wideband sounds with closely spaced frequency components.
 
 
 
 
 
Nsynth
 
Pop
 
 
Original Audio
Original Audio
 
 
Resynthesized Audio
Resynthesized Audio
 
 
 
 
 
 
 
 

Analysing GAN estimated outputs

 
 
We present a visual analysis of the GAN-estimated spectrograms using IF and PGHI methods.
Note: In IF method, the phase and phase gradient in frequency direction are estimated from the GAN estimated IF spectrogram. In PGHI method, the phase, phase gradient in frequency as well as time direction are all estimated from the GAN estimated log magnitude spectrogram
 
 
 
 
 
Nsynth
 
 
IF reconstruction
PGHI reconstruction
 
 
 
 
 
Pops
 
 
IF reconstruction
PGHI reconstruction
 
 
 
 
 
Chirps
 
 
IF reconstruction
PGHI reconstruction
 
 
 
 
 
 
 
 
 

Examples of the generated audio used for the Listening Tests

 
 
 
 
Nsynth (reconstructions using hop length = 128)
Original sample reference IF reconstruction PGHI reconstruction
 
 
Nsynth (reconstructions using hop length = 64)
Original sample reference IF reconstruction PGHI reconstruction
 
 
Chirps (reconstructions using hop length = 128)
Original sample reference IF reconstruction PGHI reconstruction
 
 
Chirps (reconstructions using hop length = 64)
Original sample reference IF reconstruction PGHI reconstruction
 
 
Pops (reconstructions using hop length = 128)
Original sample reference IF reconstruction PGHI reconstruction
 
 
Pops (reconstructions using hop length = 64)
Original sample reference IF reconstruction PGHI reconstruction