Arabic / English Sentiment Analysis: An Empirical Study

July 21, 2017 | Autor: Izzat Alsmadi | Categoría: Sentiment Analysis, World Wide Web, Empirical Study, Web Service, Social Networking Sites

Share Embed

Laporkan tautan ini

Descripción

Arabic / English Sentiment Analysis: An Empirical Study Mohammed Al-Kabi Faculty of Sciences & IT Zarqa University 13110 Zarqa – Jordan [email protected]

Noor M. Al-Qudah CS Department Faculty of Information Technology The World Islamic Sciences & Education University Tabrbour- Amman- Jordan [email protected]

Muhammad Dabour CIS Department IT & CS Faculty Yarmouk University 21163 Irbid - Jordan [email protected]

ABSTRACT The Web 2.0 refers to the second generation of World Wide Web (WWW). Web 2.0 allows Internet users to collaborate and share information online, and therefore create large virtual societies. Web 2.0 includes social network sites, Wikis, Blogs, Web services, podcasting, Multimedia sharing services ...etc. Arab users of social network sites (Facebook and Twitter) generate daily a large volume of Arabic and English textual reviews related to different social, political and scientific subjects. These reviews could be about different products, political events, sport teams, economics, video clips, restaurants, books, actors/actress, new films and songs, universities ...etc. This large volume of different Arabic and English textual reviews cannot be analyzed manually. Therefore sentiment analysis is used to identify sentiments with their subjectivity from this huge volume of reviews. In order to conduct this study a small dataset consisting of 4,050 Arabic and English reviews were collected. Three polarity dictionaries were also created (Arabic, English, and Emoticons). The collected dataset and those dictionaries were used to conduct a comparison between two free online sentiment analysis tools (SocialMention (http://socialmention.com) and Twendz (http://twendz.waggeneredstrom.com/)).

Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Information filtering. I.2.7 [Natural Language Processing] – Text analysis Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICICS’13, April 23–25, 2013, Irbid, Jordan. Copyright 2013 ACM 978-1-4503-1327-8/04/2012…$10.00.

Izzat Alsmadi CIS Department IT & CS Faculty Yarmouk University 21163 Irbid - Jordan [email protected]

Heider Wahsheh CIS Department IT & CS Faculty Yarmouk University 21163 Irbid - Jordan [email protected]

General Terms Algorithms, Experimentation.

Keywords Opinion mining, sentiment analysis, Arabic Sentiment Analysis, Arabic Opinion Mining.

1. INTRODUCTION The last decade witnessed the emergence of Web 2.0 which represents the second generation of WWW. Web 2.0 enables its users to share, collaborate, and interact to each other in different virtual societies. Social network sites (e.g., Facebook, Twitter, Google plus ...etc) represent an essential part of Web 2.0, where a large volume of textual reviews are generated by its users. These reviews are written using different natural languages such as: English, Chinese, Arabic, Spanish, French, etc. A huge volume of opinionated data accumulated daily leads to a rapid growth of sentiment analysis and opinion mining field. The opinionated data include: opinions, sentiments, attitudes, evaluations, and emoticons. Sentiment analysis also known as opinion mining is a field of study aims to extract opinions from reviews and comments, and discover the trends (polarities) of them. Most of sentiment analysis studies are based on textual reviews and comments, and do not include other multimedia (image, audio, and video) reviews and comments. These reviews are either categorized as positive, negative, neutral or as a spam in case the review is not related to the topic under discussion. The Arabic language is the top Semitic language, which is used by around 300 million people. This language is written from right to left as other Semitic languages. It ranks as the fifth largest natural language among the top 100 used natural languages worldwide. Arabs use Modern Standard Arabic (MSA) in education and media, but they used colloquial (Dialectal) Arabic mainly to express their views about different aspects of life within social media sites. The main problem of colloquial Arabic is the lack of standardization. Sentiment analysis aims to extract sentiments and determine their polarity (Positive, Negative, neutral, spam). Researchers succeed to extract sentiment using modern text mining techniques. In the real

world, companies, businesses, and service providers want to know customer opinions about their products and services. The consumers want also to know other users’ opinions about a specific product before buying it. Sentiment analysis is also used in elections, where candidates want to know the opinions of the public before voting, and at the same time the public want to know the opinions of others about political candidates. The emergence of sentiment analysis with its commercial applications leads to the emergence of an industry based on sentiment analysis [1]. Alexa.com indicates that Facebook, and YouTube for example are the first and the third top sites used worldwide, and in Jordan, Sudan and Egypt [2], in particular. This success of social media to attract more users, leads their users to generate a huge volume of valuable comments, opinions, and reviews. This valuable mine of text needs to be analyzed automatically. Therefore the last ten years witnessed an intense interest in sentiment analysis worldwide, however, a little has been done to analyze Arabic reviews, which mainly use colloquial (dialectal) Arabic. Grammar and rules govern the use of MSA, but colloquial Arabic lacks a grammar and rules showing how to use it. In addition, there are no dictionaries for colloquial Arabic. Thus there is a need to conduct a series of studies to lay the rules of colloquial Arabic, and build colloquial Arabic dictionaries. Related studies should use different tools and techniques to extract Arabic sentiments and to automatically determine their polarities. Sentiment analysis methods adapted by researchers in this field are varying, since some of the researchers use keyword-based methods which are based on a list of selected keywords stored in a seed set. These keyword-based methods use these lists to determine the polarity of reviews and sentences. Other sentiment analysis methods are dictionary-based methods. Some researchers have used machine learning to classify the reviews/sentences and determine the polarity of those reviews such as those of [3, 4], which depends on the extraction of some features such as words, phrases or part of speech (POS) [5, 6]. This study is started by collecting 4,050 Arabic/English reviews to construct a dataset, then the reviews in this dataset are tokenized to construct manually three polarity dictionaries to determine the polarity of each Arabic/English review. One of these three dictionaries is dedicated to Arabic reviews, while the second is dedicated to English reviews, and the third one is dedicated to emoticons. The conducted tests on the two free online sentiment analysis tools: SocialMention [7] and Twendz [8] reveal that SocialMention was accurate in the process of identifying the sentiments and the adoption the emoticons. The Twendz is shown to be weak in the emoticons recognition and classification The remainder of this study is organized as follow: Section 2 presents a brief review of related work, while section 3 presents the methodology followed to accomplish this study. Section 4 presents the results of the tests conducted on our proposed approach, while section 5 presents conclusion remarks and future work.

2. RELATED WORK

This section presents a summary of few numbers of current related studies to this one. The first part of this section exhibits sentiment analysis studies related to Arabic reviews. The study of [9] utilizes machine translation to translate source foreign text (non-English) to English, and then to conduct sentiment analysis on the machine translated English text. Study of [9] includes conducting sentiment analysis on textual news and blogs, which use one of the following eight natural languages: Arabic, Chinese, French, German, Italian, Japanese, Korean, and Spanish. Non-English text first has to be translated automatically to English, in order to be analyzed by a system designed only to analyze English text. Lydia text analysis system is used in their research to analyze entity sentiment, which shows that entity sentiment scores are independent of the language and the machine translation system being used. In addition, they proposed a sentiment score normalization technique for cross-language polarity comparison, which enables them to conduct meaningful cross-cultural comparisons. A number of sentiment analysis methodologies to classify Web forum opinions in a number of natural languages was proposed by [10]. Those methodologies were tested on three (Arabic and English) datasets: movie review dataset beside two hate/extremist-group forum postings. To achieve their goal special feature extraction components are integrated to compute the linguistic characteristics of Arabic, besides developing entropy weighted genetic algorithm (EWGA) for feature selection. Tests prove the effectiveness of those proposed methodologies to be used for sentiment analysis in multiple languages. Support Vector Machines (SVM) yield a high level of accuracy to classify (identify polarity of) different sentiments. The study of [11] designed and implemented a lexicon-based sentiment analysis tool dedicated to colloquial Arabic text used in some Arabic social media Websites, and Arabic comments used in Web forums and social media. Those researchers proposed that their tool should rely partially on human judgment to overcome the problem arises from using non-standardized colloquial Arabic text. An independent component of the proposed tool is a game-based lexicon which is based on human expertise, to overcome problems arising from using nonstandardized colloquial Arabic text. Another sentence-level tool called SAMAR is designed and implemented to identify subjectivity and act as a sentiment analysis tool for Arabic social media varieties, is presented in [12]. SAMAR is capable to analyze Arabic reviews which use colloquial Arabic text as well as Modern Standard Arabic (MSA). These two types of Arabic forms are widely varying, in terms of their vocabularies and the rules governing their use. In [13] an Arabic corpus for sentiment analysis is constructed. This corpus contains 500 movie reviews, where the polarity of these Arabic reviews is divided equally between positive and negative. To identify the polarity of each review two classification algorithms: Support Vector Machine (SVM) and Naïve Bayes were used. Those classifiers yield satisfactory results.

3. METHODOLOGY This section exhibits the three main steps followed to conduct this study. Figure 1 shows these steps.

this study, where those words and phrases are divided equally between (Academic, News, and Commercial). Those words are used as the input to the two tools (SocialMention and Twendz) to retrieve reviews/sentences which are divided equally between the three polarity values (Positive, negative, and neutral). In addition, SocialMention and Twendz are responsible for determining the polarity of each retrieved review and sentence. The English dictionary included in the study has 3,392 words/phrases, where 947 English words/phrases are considered positive, 1100 English words/phrases are considered negative, and 1,345 English words/phrases are considered neutral. The Arabic dictionary has 1,159 words/phrases, where 427 Arabic words/phrases are considered positive, 306 Arabic words/phrases are considered negative, and 426 are Arabic words/phrases are considered neutral. The Emoticons dictionary has 204 Emoticons, where 71 of them are considered as positive, 70 Emoticons are considered as negative, and 99 Emoticons are considered as neutral. Arabic and English phrases are used and stored inside the Arabic and English dictionaries built in this study to handle the negation of positive and negative words. Therefore an Arabic phrase (Not beautiful, " ‫ﻏﯿﺮ‬ ‫ )"ﺟﻤﯿﻞ‬is stored inside the Arabic dictionary and labeled as a negative phrase. Table 1 shows the top 10 Websites that were used to collect the comments and reviews for each dataset collected in this study. Table 1. Top 10 sources for each dataset.

Figure 1. An outline of the study framework. Figure 1 exhibits that the first main step includes the collection of one dataset consisting of three categories (Academic, News, and Commercial), and building three dictionaries. The second main step includes determining the polarity of each collected Arabic review, and in the third step the two tools (SocialMention and Twendz) under consideration are evaluated.

4. EXPERIMENTS 4.1 Datasets and Preprocessing We started with the collection of an initial English/Arabic dataset containing 4,050 English/Arabic comments and reviews generated by the users of social network sites. The top 45 words and phrases for each of the three categories (Academic, News, and Commercial) are used in

Academic Dataset

Commercial Dataset

News Dataset

Twitter

Twitter

Twitter

Digg

Facebook

Facebook

Facebook

Digg

Digg

Identi

Identi

YouTube

Reddit

Reddit

Reddit

YouTube

YouTube

Delicious

Delicious

Delicious

Identi

Friendfeed

Yahoo news

Amazon

Yahoo news

Amazon

Bing

Google

Autoblog

Dailystar

Each collected comment and review in this study is preprocessed before storing it within one of the above three (Academic, Commercial, and News) categories. Spammed and noisy comments are removed to avoid inconsistency. In addition, duplicated comments and reviews are removed to guarantee dataset content uniqueness. This dataset was used to create three polarity dictionaries: (Arabic, English, and Emoticons). These dictionaries were used to empirically evaluate SocialMention and Twendz. The dataset includes emoticons and (English and Arabic) reviews. Therefore three polarity dictionaries were built to determine the polarity

of Arabic and English comments and reviews even if they contain emoticons. Table 2 shows a sample of the content of the three polarity dictionaries used in this study to identify the polarity of each comment and review in the three collected datasets.

Accuracy 

TP  TN ………........................... (4.1) TP  FP  TN  FN

Where TP is a true positive, FP is a false positive, TN is a true negative, and FN is a false negative [14]. Error: is the degree of closeness that a measured value represents the incorrect value [14].

Table 2. A sample of the three polarity dictionary contents. Dictionary

Positive

Negative

Neutral

English

Good

Bad

Samsung

Arabic

(Love, "‫)"ﺣﺐ‬

(I hate, "‫)"أﻛﺮه‬

(Nokia, "‫)"ﻧﻮﻛﯿﺎ‬

Emoticons

(^_^, "Happy")

( 3:), "Devil" )

(\_/, "Empty Glass" )

A program was designed and implemented to encode the contents of the three polarity dictionaries. This program starts reading the dictionary contents and assign to each entry in this dictionary one of the following three values: (1 for positive), (0 for negative) and (? for neutral). Each dictionary entry either uses Arabic, English, or Emoticons. After identifying the polarity of each entry in the polarity dictionaries, the program starts reading and determining the polarity of each entry (comment or review) in the collected datasets, by creating a sequence of symbols (0, 1, ?) to determine the final polarity of each entry in datasets. The output of this step is stored in two binary files. First binary file is dedicated to Arabic entries, while the second one is dedicated to English entries. The evaluation of SocialMention and Twendz is based on these two binary files. These two binary files represent the polarity of the datasets according to the three polarity dictionaries point view. Two binary files are used in this study. The first binary file is dedicated to Arabic language and the second binary file is dedicated to English language. Each binary file contains positive, negative, neutral words, besides emoticons.

4.2 Results This section exhibits the preliminary results of evaluating the accuracy of SocialMention and Twendz. The goal of this study is to evaluate the two free online sentiment analysis tools, by finding the classification accuracy of each tool and developing dictionary-based classifier. The evaluation is based on three popular machine learning algorithms (Naïve Bayes, Support Vector Machine (SVM), and K-Nearest Neighbor algorithm (K-NN)), where 66% of the collected dataset is used as learning set, and 34% as a test set. In the evaluation, we used the following four performance measures: Accuracy: is the degree of closeness that a measured value represents the correct value The formula of the Accuracy is presented in formula (4.1):

The formulas of the other two performance measures (Recall and precision) are shown next. The formula of the Recall is presented in formula (4.2), and the formula of the Precision is presented in formula (4.3) [15]:

Recalli 

TP TP  FN

Precisioni 

………..……………………......………. (4.2)

TP .……..………………………………. (4.3) TP  FP

where TP is the number of documents correctly classified as belonging to class i (“true positive”), FP is the number of documents falsely classified as belonging to class i (“false positive”) and FN is the number of documents falsely classified as not belonging to class i (“false negative”) [15]. Using Naïve Bayes algorithm, the Twendz tool yielded an accuracy of 45.3%. While the SocialMention tool yielded an accuracy of 66.2%. Table 3 shows a detailed evaluation results for the Twendz tool, and SocialMention tool using Naïve Bayes algorithm. Table 3. Naïve Bayes Evaluation Results for SocialMention and Twendz. Performance Measures

Twendz

Socialmention

Accuracy

45.3%

66.2%,

Error

54.7%

33.7%

Precision

0.45

0.66

Recall

0.45

0.66

Table 3 summarizes the empirical accuracy percentages of identifying the polarity of each Arabic/English comment and review in the three collected datasets, by the two free online sentiment analysis tools: (SocialMention and Twendz). The results indicate that SocialMention is more effective than its counterpart (Twendz) to identify the polarity of each entry in the collected dataset. Table 4 presents the detailed evaluation results of the two tools using SVM. SVM algorithm yields an accuracy of 43.3% for the Twendz tool, and the SVM algorithm yields an accuracy of 65.4% for the SocialMention tool.

Table 4. SVM Evaluation Results for SocialMention and Twendz. Performance Measures

Twendz

Socialmention

Accuracy

43.3%

65.4%

Error

56.7%

34.6%

Precision

0.47

0.65

Recall

0.47

0.65

Although the results of Table 4 show a slight accuracy degrading of the two tools (Twendz and SocialMention), but the performance measures of SocialMention is still better than the performance measures of Twendz. Table 5 presents the detailed evaluation results of the two tools using KNN. Applying K-NN algorithm when K = 1 yields an accuracy of 44.4% for the Twendz tool, and an accuracy of 62.5% for the SocialMention tool.

This study is based on one small dataset containing 4,050 of Arabic and English comments and reviews collected from Yahoo news, YouTube, Facebook, Twitter, Digg, Identi, Reddit ...etc. Two free online sentiment analysis tools (SocialMention and Twendz) are used to collect Arabic and English comments and reviews. These few thousands of collected Arabic/English comments and reviews are classified into three equal categories: (commercial, academic, and political news). These datasets were used to create three polarity dictionaries (Arabic, English, and Emoticons) which are used to identify the polarity of each comment and review in the dataset. A program is designed and implemented to assign a polarity to each entry in the dictionaries and in the dataset. Benchmarking tests show that SocialMention is more accurate to identify the polarity of Arabic/English comments and reviews relative to its counterpart Twendz. Future work includes the use of a larger dataset, beside testing more free online sentiment analysis tools.

Table 5. K-NN Evaluation Results of SocialMention and Twendz. Performance Measures

Twendz

Socialmention

Accuracy

44.4%

62.6%

Error

55.6%

37.4%

Precision

0.46

0.63

Recall

0.44

0.62

The results shown in Table 5 indicate that the K-NN accuracy results are between Naïve Bayes, and SVM accuracy results for the Twendz tool. It should also be noticed that in Table 5 K-NN accuracy results for the SocialMention tool is the lowest relative to Naïve Bayes and SVM accuracy results for the same tool. The conducted results showed that the Naïve Bayes algorithm yields the best results for both SocialMention and Twendz tools. Also the experiments showed that SocialMention sentiment analysis tool is more effective than its counterpart (Twendz).

5. CONCLUSION AND FUTURE WORK

6. REFERENCES [1] [2] [3]

[4]

[5]

[6]

[7] [8] [9]

[10]

[11]

Liu, B. 2012. Sentiment Analysis and Opinion Mining (Synthesis Lectures on Human Language Technologies), Morgan & Claypool. Alexa Top 500 Global Sites, http://www.alexa.com/topsites accessed on January 6, 2013. Pang, B., Lillian, L., and Shivakumar., V. 2002. Thumbs up? Sentiment Classi¯ cation using Machine Learning Techniques, In the proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Philadelphia, USA, 7986. Pang., B., and Lillian, L. 2004. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In the Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL '04). Association for Computational Linguistics. Stroudsburg, PA, USA, Article 271. Thelwall, M., Wilkinson, D., and Uppal, S. 2010. Data Mining Emotion in Social Network Communication: Gender Differences in MySpace. Journal of the American Society for Information Science and Technology. 61, 1, 190–199. Thelwall, M., Buckley, K., and Paltoglou, G. 2012. Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology. 63, 1, 163-173. Socialmention website, accessed on February 24, 2012, from http://www.socialmention.com Twendz website, accessed on February 24, 2012, from http://twendz.waggeneredstrom.com Bautin, M., Vijayarenu, L., Skiena, S. 2008. International Sentiment Analysis for News and Blogs. In 2nd International Conference on Weblogs and Social Media (ICWSM 2008). 19–26. Abbasi, A., Chen, H., and Salem, A. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Transactions on Information Systems. 26, 3, Article 12, 1-34. Al-Subaihin, A., Al-Khalifa, H., and Al-Salman, A. 2011. A proposed sentiment analysis tool for modern Arabic using humanbased computing. In Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services (iiWAS '11), ACM. New York, NY, USA, 543-546.

[12] Abdul-Mageed, M., Kübler, S., and Diab, M. 2012. SAMAR: a

system for subjectivity and sentiment analysis of Arabic social media. In Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis (WASSA '12). Association for Computational Linguistics, Stroudsburg, PA, USA, 19-28. [13] Rushdi-Saleh, M., Martín-Valdivia, M., Ureña-López, L., and Perea-Ortega, J. 2011. OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology. 62, 10, 2045-2054. [14] Witten I. H. and Frank, E. 2005. Data Mining: Practica Machine Learning Tools and Techniques, Morgan Kaufmann Series in Data Management Systems, second edition, Morgan Kaufmann (MK). [15] Paltoglou, G., and Thelwall, M. 2012. Twitter, MySpace, Digg: Unsupervised Sentiment Analysis in Social Media. ACM Transactions on Intelligent Systems and Technology (TIST). 3, 4, Article 66, 1-19.

Lihat lebih banyak...

Arabic / English Sentiment Analysis: An Empirical Study

Descripción

Comentarios