Skip to main navigation Skip to search Skip to main content

Emotional 3D speech visualization from 2D audio visual data

  • Universidad Peruana de Ciencias Aplicadas

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Visual speech is hard to recreate by human hands because animation itself is a time-consuming task: both precision and detail must be considered and match the expectations of the developers, but above all, those of the audience. To solve this problem, some approaches has been designed to help accelerate the animation of characters faces, as procedural animation or speech-lip synchronization, where the most common areas for researching these methods are Computer Vision and Machine Learning. However, in general, these tools can have any of these main problems: difficulty on adapting to another language, subject or animation software, high hardware specifications, or the results can be receipted as robotic. Our work presents a Deep Learning model for automatic expressive facial animation using audio. We extract generic audio features from expressive audio speeches rich in phonemes for nonidiom focus speech processing and emotion recognition. From videos used for training, we extracted the landmarks for frame-speech targeting and have the model learn animation for phonemes pronunciation. We evaluated four variants of our model (two function losses and with emotion conditioning) by using a user perspective survey where the one using a Reconstruction Loss Function with emotion training conditioning got more natural results and score in synchronization with the approval of the majority of interviewees. For perception of naturalness, it obtained a 38.89% of the total votes of approval and for language synchronization obtained the highest average score with 65.55% (98.33 of a 150 total points) for English, German and Korean languages.

Original languageEnglish
Article number2450002
JournalInternational Journal of Modeling, Simulation, and Scientific Computing
Volume14
Issue number5
DOIs
StatePublished - 1 Oct 2023

Keywords

  • audio-visual speech
  • procedural animation
  • Speech animation

Fingerprint

Dive into the research topics of 'Emotional 3D speech visualization from 2D audio visual data'. Together they form a unique fingerprint.

Cite this