Abstract
The Moving Picture Experts Group - 1 (MPEG-1) perceptual audio compression scheme is a successful family of audio codecs described in standard ISO/IEC 11172–3. Currently, there is no general framework to emulate nor MPEG-1 neither any other psychoacoustic model, which is a core piece of many perceptual codecs. This work presents a successful implementation of a convolutional neural network which emulates psychoacoustic model 1 from the MPEG-1 standard, termed “MCNN-PM” (Multiscale Convolutional Neural Network – Psychoacoustic Model). It is then implemented as part of the MPEG-1, Layer I codec. Using the objective difference grade (ODG) to evaluate audio quality, the MCNN-PM MPEG-1, Layer I codec outperforms the original MPEG-1, Layer I codec by up to 17% at 96 kbps, 14% at 128 kbps and performs almost equally at 192 kbps. This work shows that convolutional neural networks are a viable alternative to standard psychoacoustic models and can be used as part of perceptual audio codecs successfully.
| Original language | English |
|---|---|
| Pages (from-to) | 6963-6974 |
| Number of pages | 12 |
| Journal | Multimedia Tools and Applications |
| Volume | 83 |
| Issue number | 3 |
| DOIs | |
| State | Published - Jan 2024 |
Keywords
- MPEG
- audio coding
- neural networks
- perceptual coding
- psychoacoustic model
Fingerprint
Dive into the research topics of 'MPEG-1 psychoacoustic model emulation using multiscale convolutional neural networks'. Together they form a unique fingerprint.Press/Media
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver