A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding

TORUNOĞLU-SELAMET, Dilara, ARSLAN, Doğukan, WILKENS, Rodrigo, HE, Wei, ERYIĞIT, Doruk, PICKARD, Thomas, PAGANO, Adriana S, VILLAVICENCIO, Aline, ERYIĞIT, Gülşen, ABUCZKI, Ágnes, CARDOSO, Aida, LAZARENKA, Alesia, ALMASSOVA, Dina, MENDES, Amália, KANELLOPOULOU, Anna, BROSA-RODRIGUEZ, Antoni, VALKOVSKA, Baiba, WOJTOWICZ, Beata, PEDERSEN, Bolette, HIDALGO-TERNERO, Carlos Manuel, LIEBESKIND, Chaya, JOKIĆ, Danka, ALVES, Diego, TRIANTAFYLLIDI, Eleni, VELLDAL, Erik, PHILIPPY, Fred, OLESKEVICIENE, Giedre Valunaite, RIZGELIENE, Ieva, SKADINA, Inguna, LOBZHANIDZE, Irina, HAUGEN, Isabell Stinessen, KRITO, Jauza Akbar, MARKOVIĆ, Jelena M, MONTI, Johanna, SAUCA, Josue Alejandro, DOBROVOLJC, Kaja, UGWUANYI, Kingsley O, RITUMA, Laura, ØVRELID, Lilja, AGRO, Maha Tufail, ABJALOVA, Manzura, CHATZIGRIGORIOU, Maria, RAMOS, María del Mar Sánchez, PENDEVSKA, Marija, SEYYEDREZAEI, Masoumeh, SHAMSFARD, Mehrnoush, AHSAN, Momina, KHAN, Muhammad Ahsan Riaz, NORMAN, Nathalie Carmen Hau, AYYILDIZ, Nilay Erdem, HOSSEINI-KIVANANI, Nina, LIGETI-NAGY, Noémi, NAEEM, Numaan, KANISHCHEVA, Olha, YATSYSHYNA, Olha, OREL, Daniil, GIOMMARELLI, Petra, OSENOVA, Petya, GARABIK, Radovan, SEMOU, Regina E, REBECHI, Rozane, PRANIDA, Salsabila Zahirah, TOUILEB, Samia, NIMB, Sanni, AHMAD, Sarfraz, SHARIPOVA, Sarvinoz, GOLAN, Shahar, JI, Shaoxiong, ABOH, Sopuruchi Christian, SUCUR, Srdjan, MARKANTONATOU, Stella, OLSEN, Sussi, TAJALLI, Vahide, LIPP, Veronika, GIOULI, Voula, ERAYDIN, Yelda Yeşildal, SAABERI, Zahra and XIE, Zhuohan (2026). A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding. In: PIPERIDIS, Stelios, BEL, Núria, VAN DEN HEUVEL, Henk, IDE, Nancy, KREK, Simon and TORAL, Antonio, (eds.) The Fifteenth Language Resources and Evaluation Conference (LREC 2026). Palma, Mallorca, Spain, European Language Resources Association (ELRA), 9434-9448. [Book Section]

Documents
37459:1279980
[thumbnail of ADMIRE2___CoreGroup_LREC_Prep.pdf]
Preview
PDF
ADMIRE2___CoreGroup_LREC_Prep.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.

Download (4MB) | Preview
Abstract
Potentially idiomatic expressions (PIEs) carry meanings inherently tied to the everyday experience of a given language community. As such, they constitute an interesting challenge for assessing the linguistic (and to some extent cultural) capabilities of NLP systems. In this paper, we present XMPIE, a parallel multilingual and multimodal dataset of potentially idiomatic expressions. The dataset, containing 34 languages and over ten thousand items, allows comparative analyses of idiomatic patterns among language-specific realisations and preferences in order to gather insights about shared cultural aspects. This parallel dataset allows evaluation of language model performance for a given PIE in different languages and whether idiomatic understanding in one language can be transferred to another. Moreover, the dataset supports the study of PIEs across textual and visual modalities, to measure to what extent PIE understanding in one modality transfers or implies in understanding in another modality (text vs. image). The data was created by language experts, with both textual and visual components crafted under multilingual guidelines, and each PIE is accompanied by five images representing a spectrum from idiomatic to literal meanings, including semantically related and random distractors. The result is a high-quality benchmark for evaluating multilingual and multimodal idiomatic language understanding.
More Information
Statistics

Downloads

Downloads per month over past year

View more statistics

Metrics

Altmetric Badge

Dimensions Badge

Share
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Actions (login required)

View Item View Item