Automatic collection of transcribed speech for low resources languages

AGUIAR, Thales and DA COSTA ABREU, Marjory (2023). Automatic collection of transcribed speech for low resources languages. In: 2023 IEEE 13th International Conference on Pattern Recognition Systems (ICPRS). IEEE.

[img]
Preview
PDF
ttbacc_dataset_paper.pdf - Accepted Version
Creative Commons Attribution.

Download (623kB) | Preview
Official URL: https://ieeexplore.ieee.org/document/10179033
Link to published version:: https://doi.org/10.1109/icprs58416.2023.10179033

Abstract

Speech is a crucial for human communication and combined with the evolution of instant messaging in voice format as well as automated chatbots, its importance is greater. While the majority of speech technologies have achieved high accuracy, they fail when tested for accents that deviate from the “standard” of a language. This becomes more concerning for languages that lack on datasets and have scarce literature, like Brazilian Portuguese. Thus, this paper proposes a methodology to collect and release a speech dataset for Brazilian Portuguese. The method explores the availability of data and information in video platforms, and automatically extracts the audio from TEDx Talks.

Item Type: Book Section
Identification Number: https://doi.org/10.1109/icprs58416.2023.10179033
SWORD Depositor: Symplectic Elements
Depositing User: Symplectic Elements
Date Deposited: 21 Jul 2023 08:44
Last Modified: 11 Oct 2023 13:15
URI: https://shura.shu.ac.uk/id/eprint/32178

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics