MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

Resumen

The Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many perceptual codecs. This work presents a successful implementation of a convolutional neural network which emulates psychoacoustic model 1 from the MPEG-1 standard, termed “MCNN-PM” (Multiscale Convolutional Neural Network – Psychoacoustic Model). It is then implemented as part of the MPEG-1, Layer I codec. Using the objective difference grade (ODG) to evaluate audio quality, the MCNN-PM MPEG-1, Layer I codec outperforms the original MPEG-1, Layer I codec by up to 17% at 96 kbps, 14% at 128 kbps and performs almost equally at 192 kbps. This work shows that convolutional neural networks are a viable alternative to standard psychoacoustic models and can be used as part of perceptual audio codecs successfully.

Idioma originalInglés
Páginas (desde-hasta)6963-6974
Número de páginas12
PublicaciónMultimedia Tools and Applications
Volumen83
N.º3
DOI
EstadoPublicada - ene. 2024

Huella

Profundice en los temas de investigación de 'MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks'. En conjunto forman una huella única.

Citar esto