Sentiment analysis and resources for informal Arabic text on social media

Itani, Maher

Sentiment analysis and resources for informal Arabic text on social media

Tools

ITANI, Maher (2018). Sentiment analysis and resources for informal Arabic text on social media. Doctoral, Sheffield Hallam University. [Thesis]

[+][-]

Documents

23402:520812

[+][-]

23402:520812

[thumbnail of Itani_2018_phd_SentimentAnalysisAnd.pdf]

Preview

PDF
Itani_2018_phd_SentimentAnalysisAnd.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

Abstract

Online content posted by Arab users on social networks does not generally abide by the grammatical and spelling rules. These posts, or comments, are valuable because they contain users’ opinions towards different objects such as products, policies, institutions, and people. These opinions constitute important material for commercial and governmental institutions. Commercial institutions can use these opinions to steer marketing campaigns, optimize their products and know the weaknesses and/ or strengths of their products. Governmental institutions can benefit from the social networks posts to detect public opinion before or after legislating a new policy or law and to learn about the main issues that concern citizens. However, the huge size of online data and its noisy nature can hinder manual extraction and classification of opinions present in online comments. Given the irregularity of dialectal Arabic (or informal Arabic), tools developed for formally correct Arabic are of limited use. This is specifically the case when employed in sentiment analysis (SA) where the target of the analysis is social media content. This research implemented a system that addresses this challenge. This work can be roughly divided into three blocks: building a corpus for SA and manually tagging it to check the performance of the constructed lexicon-based (LB) classifier; building a sentiment lexicon that consists of three different sets of patterns (negative, positive, and spam); and finally implementing a classifier that employs the lexicon to classify Facebook comments. In addition to providing resources for dialectal Arabic SA and classifying Facebook comments, this work categorises reasons behind incorrect classification, provides preliminary solutions for some of them with focus on negation, and uses regular expressions to detect the presence of lexemes. This work also illustrates how the constructed classifier works along with its different levels of reporting. Moreover, it compares the performance of the LB classifier against Naïve Bayes classifier and addresses how NLP tools such as POS tagging and Named Entity Recognition can be employed in SA. In addition, the work studies the performance of the implemented LB classifier and the developed sentiment lexicon when used to classify other corpora used in the literature, and the performance of lexicons used in the literature to classify the corpora constructed in this research. With minor changes, the classifier can be used in domain classification of documents (sports, science, news, etc.). The work ends with a discussion of research questions arising from the research reported.

More Information

Contributors:

Thesis advisor - Roast, Chris [0000-0002-6931-6252]
Thesis advisor - Al-Khayatt, Samir

Additional Information:

Director of studies - Chris Roast "No PQ harvesting"

Research Institute, Centre or Group - Does NOT include content added after October 2018:

Sheffield Hallam Doctoral Theses

Identifiers

Identification Number:

10.7190/shu-thesis-00118

Library

Item Type:

Thesis (Doctoral)

Depositing User:

Louise Beirne

Date record made live:

21 Nov 2018 11:25

Last Modified:

03 May 2023 02:06

Date of first compliant deposit:

21 November 2018

Date of first compliant Open Access:

21 November 2018

Version of first compliant deposit:

Author Accepted Manuscript

URI:

https://shura.shu.ac.uk/id/eprint/23402

Statistics

Downloads

Downloads per month over past year

View more statistics

Metrics

Actions (login required)

View Item

Sheffield Hallam University Research Archive

Sentiment analysis and resources for informal Arabic text on social media

Downloads

Altmetric Badge

Dimensions Badge

Actions (login required)

Sheffield Hallam University

City Campus, Howard Street

Sheffield S1 1WB

Sheffield Hallam University Research Archive

Contact us: shura@shu.ac.uk

Research at SHU

SHU Library