Skip to main navigation Skip to search Skip to main content

MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks

  • Universidad Peruana de Ciencias Aplicadas

Research output: Contribution to journalArticlepeer-review

Abstract

The Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many perceptual codecs. This work presents a successful implementation of a convolutional neural network which emulates psychoacoustic model 1 from the MPEG-1 standard, termed “MCNN-PM” (Multiscale Convolutional Neural Network – Psychoacoustic Model). It is then implemented as part of the MPEG-1, Layer I codec. Using the objective difference grade (ODG) to evaluate audio quality, the MCNN-PM MPEG-1, Layer I codec outperforms the original MPEG-1, Layer I codec by up to 17% at 96 kbps, 14% at 128 kbps and performs almost equally at 192 kbps. This work shows that convolutional neural networks are a viable alternative to standard psychoacoustic models and can be used as part of perceptual audio codecs successfully.

Original languageEnglish
Pages (from-to)6963-6974
Number of pages12
JournalMultimedia Tools and Applications
Volume83
Issue number3
DOIs
StatePublished - Jan 2024

Keywords

  • MPEG
  • audio coding
  • neural networks
  • perceptual coding
  • psychoacoustic model

Fingerprint

Dive into the research topics of 'MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks'. Together they form a unique fingerprint.

Cite this