Corpora for sentiment analysis of Arabic text in social media

ITANI, Maher, ROAST, Chris and AL-KHAYATT, Samir (2017). Corpora for sentiment analysis of Arabic text in social media. In: 8th International Conference on Information and Communication Systems (ICICS). IEEE, 64-69.

Roast - Corpora for sentiment analysis (AM).pdf - Accepted Version
Available under License All rights reserved.

Download (165kB) | Preview
Official URL:
Link to published version:: 10.1109/IACS.2017.7921947


Different Natural Language Processing (NLP) applications such as text categorization, machine translation, etc., need annotated corpora to check quality and performance. Similarly, sentiment analysis requires annotated corpora to test the performance of classifiers. Manual annotation performed by native speakers is used as a benchmark test to measure how accurate a classifier is. In this paper we summarise currently available Arabic corpora and describe work in progress to build, annotate, and use Arabic corpora consisting of Facebook (FB) posts. The distinctive nature of thesecorpora is that it is based on posts written in Dialectal Arabic (DA) not following specific grammatical or spelling standards. The corpora are annotated with five labels (positive, negative, dual, neutral, and spam). In addition to building the corpus, the paper illustrates how manual tagging can be used to extract opinionated words and phrases to be used in a lexicon-based classifier.

Item Type: Book Section
Uncontrolled Keywords: Sentiment Analysis, Arabic, Social Media, Informal arabic
Research Institute, Centre or Group: Cultural Communication and Computing Research Institute > Communication and Computing Research Centre
Departments: Arts, Computing, Engineering and Sciences > Computing
Identification Number: 10.1109/IACS.2017.7921947
Depositing User: Chris Roast
Date Deposited: 01 Mar 2017 14:50
Last Modified: 22 Jul 2017 06:35

Actions (login required)

View Item View Item


Downloads per month over past year

View more statistics