TY - JOUR
T1 - Model for Real-Time Subtitling from Spanish to Quechua Based on Cascade Speech Translation
AU - Alvarez-Crespo, Abraham
AU - Miranda-Salazar, Diego
AU - Ugarte, Willy
N1 - Publisher Copyright:
© 2023 by SCITEPRESS – Science and Technology Publications, Lda.
PY - 2023
Y1 - 2023
N2 - The linguistic identity of many indigenous peoples has become relevant in recent years. The speed with which many of these have been lost faces many countries of the world with a serious reduction of their cultural heritage. In South America, the critical situation of vulnerability of many of its native languages is alarming. Even languages such as Quechua, widely spoken in the region, face an early disappearance due to a low rate of inter-generational transmission. Aware of this problem, we have proposed the development of a translation and subtitling system for short film videos from Spanish to Quechua. The proposal contemplates the use of cinema to promote and retain the language. For its realization, we have built a solution that combines a Spanish Voice Recognition system and our proposal for a Quechua Machine translation model. This will be integrated with a desktop application that will also subtitle the film videos. In the tests we carry out, we have obtained better translation indicators than past proposals; in addition to the validation of Quechua-speaking users of the tool’s value. Aware of this problem, we have proposed a speech-to-text translation model that could be used as a resource for language revitalization. For its realization, we developed a cascade architecture that combines a Spanish speech recognition module and our proposal of a Quechua machine translation module, fine-tuned from a Turkish NMT model and a parallel public dataset. Additionally, we developed a subtitling algorithm to be joined with our solution into a real-time subtitling desktop application for clips of films. In the tests we carry out, we have obtained better BLEU and chrF scores than previous proposals; in addition to the validation of the translation returned in the subtitles by the Quechua speakers consulted.
AB - The linguistic identity of many indigenous peoples has become relevant in recent years. The speed with which many of these have been lost faces many countries of the world with a serious reduction of their cultural heritage. In South America, the critical situation of vulnerability of many of its native languages is alarming. Even languages such as Quechua, widely spoken in the region, face an early disappearance due to a low rate of inter-generational transmission. Aware of this problem, we have proposed the development of a translation and subtitling system for short film videos from Spanish to Quechua. The proposal contemplates the use of cinema to promote and retain the language. For its realization, we have built a solution that combines a Spanish Voice Recognition system and our proposal for a Quechua Machine translation model. This will be integrated with a desktop application that will also subtitle the film videos. In the tests we carry out, we have obtained better translation indicators than past proposals; in addition to the validation of Quechua-speaking users of the tool’s value. Aware of this problem, we have proposed a speech-to-text translation model that could be used as a resource for language revitalization. For its realization, we developed a cascade architecture that combines a Spanish speech recognition module and our proposal of a Quechua machine translation module, fine-tuned from a Turkish NMT model and a parallel public dataset. Additionally, we developed a subtitling algorithm to be joined with our solution into a real-time subtitling desktop application for clips of films. In the tests we carry out, we have obtained better BLEU and chrF scores than previous proposals; in addition to the validation of the translation returned in the subtitles by the Quechua speakers consulted.
KW - Machine Translation
KW - Quechua Revitalization
KW - Speech Recognition
UR - https://www.scopus.com/pages/publications/85184959351
U2 - 10.5220/0011783300003393
DO - 10.5220/0011783300003393
M3 - Artículo de la conferencia
AN - SCOPUS:85184959351
SN - 2184-3589
VL - 3
SP - 837
EP - 844
JO - International Conference on Agents and Artificial Intelligence
JF - International Conference on Agents and Artificial Intelligence
T2 - 15th International Conference on Agents and Artificial Intelligence, ICAART 2023
Y2 - 22 February 2023 through 24 February 2023
ER -