AGUIAR, Thales and DA COSTA ABREU, Marjory (2023). Automatic collection of transcribed speech for low resources languages. In: 2023 IEEE 13th International Conference on Pattern Recognition Systems (ICPRS). IEEE. [Book Section]
Documents
32178:619817
PDF
ttbacc_dataset_paper.pdf - Accepted Version
Available under License Creative Commons Attribution.
ttbacc_dataset_paper.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (623kB) | Preview
Abstract
Speech is a crucial for human communication and combined with the evolution of instant messaging in voice format as well as automated chatbots, its importance is greater. While the majority of speech technologies have achieved high accuracy, they fail when tested for accents that deviate from the “standard” of a language. This becomes more concerning for languages that lack on datasets and have scarce literature, like Brazilian Portuguese. Thus, this paper proposes a methodology to collect and release a speech dataset for Brazilian Portuguese. The method explores the availability of data and information in video platforms, and automatically extracts the audio from TEDx Talks.
More Information
Statistics
Downloads
Downloads per month over past year
Metrics
Altmetric Badge
Dimensions Badge
Share
Actions (login required)
View Item |