Sentiment Classification of Bank Clients’ Reviews Written in the Polish Language

Idczak, Adam Piotr

doi:10.18778/0208-6018.353.03

Article details

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

2021 | 2 | 353 | 43-56

Article title

Sentiment Classification of Bank Clients’ Reviews Written in the Polish Language

Authors

Adam Piotr Idczak

Content

Full texts:

Download

Title variants

PL

Analiza sentymentu na podstawie polskojęzycznych recenzji klientów banku

Languages of publication

Abstracts

PL

Szacuje się, że około 80% wszystkich danych gromadzonych i przechowywanych w systemach informacyjnych przedsiębiorstw ma postać dokumentów tekstowych. Artykuł jest poświęcony jednemu z podstawowych problemów textminingu, tj. klasyfikacji tekstów w analizie sentymentu, która rozumiana jest jako badanie wydźwięku tekstu. Brak określonej struktury dokumentów tekstowych jest przeszkodą w realizacji tego zadania. Taki stan rzeczy wymusił rozwój wielu różnorodnych technik ustalania sentymentu dokumentów. W artykule przeprowadzono analizę porównawczą dwóch metod badania sentymentu: naiwnego klasyfikatora Bayesa oraz regresji logistycznej. Badane teksty są napisane w języku polskim, pochodzą z banków i mają charakter marketingowy. Klasyfikację przeprowadzono, stosując podejście bag‑of‑n‑grams. W ramach tego podejścia dokument tekstowy wyrażony jest za pomocą podciągów składających się z określonej liczby n wyrazów. Uzyskane wyniki pokazały, że lepiej spisała się regresja logistyczna.

EN

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i.e. text classification in sentiment analysis, which focuses on determining the sentiment of a document. A lack of defined structure of the text makes this problem more challenging. This has led to the development of various techniques used in determining the sentiment of a document. In this paper, a comparative analysis of two methods in sentiment classification, a naive Bayes classifier and logistic regression, was conducted. Analysed texts are written in the Polish language and come from banks. The classification was conducted by means of a bag‑of‑n‑grams approach, where a text document is presented as a set of terms and each term consists of n words. The results show that logistic regression performed better.

Keywords

PL

analiza sentymentu klasyfikacja dokumentów textmining regresja logistyczna naiwny klasyfikator Bayesa

EN

sentiment analysis opinion mining text classification text mining logistic regression naive Bayes classifier

Publisher

Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

Year

2021

Volume

2

Issue

353

Pages

43-56

Physical description

Dates

published

2021

Contributors

author

Adam Piotr Idczak

University of Łódź, Faculty of Economics and Sociology, Department of Statistical Methods Łódź, Poland

https://orcid.org/0000000196762410

References

Asur S., Huberman B. A. (2010), Prediction the Future with Social Media, https://www.researchgate.net/publication/45909086_Predicting_the_Future_with_Social_Media [accessed: 10.02.2021].
Bermingham A., Smeaton A. F. (2011), On Using Twitter to Monitor Political Sentiment and Predict Election Results, “Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP)”, pp. 2–10, https://www.aclweb.org/anthology/W11-3702.pdf [accessed: 10.02.2021].
Das S., Chen M. (2001), Yahoo! For Amazon: Extracting Market Sentiment from Stock Message Boards, “Proceedings of APFA–2001”.
Dave K., Lawrence S., Pennock D. M. (2003), Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews, “Proceedings of International Conference on World Wide Web (WWW–2003)”, https://www.researchgate.net/publication/2904559_Mining_the_Peanut_Gallery_Opinion_Extraction_and_Semantic_Classification_of_Product_Reviews [accessed: 10.02.2021].
Domański Cz., Pruska K. (2000), Nieklasyczne metody statystyczne, PWE, Warszawa.
Hanbury A., Nopp C. (2015), Detecting Risks in the Banking System by Sentiment Analysis, “Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing”, pp. 591–600, https://www.aclweb.org/anthology/D15-1071.pdf [accessed: 15.02.2021].
Hosmer D. W., Lemeshow S., Sturdivant R. X. (2013), Applied Logistic Regression, 3rd ed., John Wiley & Sons, New Jersey.
Liu B. (2015), Sentiment Analysis. Mining Opinions, Sentiments, and Emotions, Cambridge University Press, New York.
Loughran T., McDonald B. (2011), When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‑Ks, “Journal of Finance”, vol. 66, no. 1, pp. 35–65, https://www.uts.edu.au/sites/default/files/ADG_Cons2015_Loughran%20McDonald%20JE%202011.pdf [accessed: 19.02.2021].
Morinaga S., Yamanishi K., Tateishi K., Fukushima T. (2002), Mining Product Reputations on the Web, “Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD–2002)”, https://www.researchgate.net/publication/200044311_Mining_product_reputations_on_the_Web [accessed: 10.02.2021].
Na J.Ch., Khoo C., Wu P. H.J. (2005), Use of negation phrases in automatic sentiment classification of product reviews, “Library Collections, Acquisitions & Technical Services”, no. 29, pp. 180–191, https://ccc.inaoep.mx/~villasen/bib/Use%20of%20negation%20phrases%20in%20automatic%20sentiment%20classification.pdf [accessed: 11.02.2021].
Nasukawa T., Yi J. (2003), Sentiment Analysis: Capturing Favorability Using Natural Language Processing, “Proceedings of the K‑CAP–03, 2nd International Conference on Knowledge Capture”, pp. 70–77, https://www.researchgate.net/publication/220916772_Sentiment_analysis_Capturing_favorability_using_natural_language_processing [accessed: 15.02.2021].
Pang B., Lee L., Vaithyanathan S. (2002), Thumbs up? Sentiment Classification using Machine Learning Techniques, “Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)”, pp. 79–86, https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf [accessed: 8.02.2021].
Review Centre, https://www.reviewcentre.com/ [accessed: 25.02.2021].
Saif H., He Y., Alani H. (2012), Alleviating data sparsity for Twitter sentiment analysis, [in:] 2nd Workshop on Making Sense of Microposts (#MSM2012): Big things come in small packages at the 21st International Conference on the World Wide Web (WWW’12), 16 Apr 2012, Lyon, France, CEUR Workshop Proceedings (CEUR‑WS.org), pp. 2–9, https://www.researchgate.net/publication/228450062_Alleviating_Data_Sparsity_for_Twitter_Sentiment_Analysis [accessed: 25.02.2021].
Sullivan D. (2001), Integrating Data and Document Warehouses, “DM Review Magazine”, http://www.dmreview.com/article_sub_articleId_3697.html [accessed: 18.02.2021].
Tong R.M (2001), An Operational System for Detecting and Tracking Opinions in on‑Line Discussion, “Proceedings of SIGIR Workshop on Operational Text Classification”.
Turney P. D. (2002), Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, “Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)”, pp. 417–424, https://www.researchgate.net/publication/248832100_Thumbs_Up_or_Thumbs_Down_Semantic_Orientation_Applied_to_Unsupervised_Classification_of_Reviews [accessed: 22.02.2021].
Wiebe J. (2000), Learning Subjective Adjectives from Corpora, “Proceedings of National Conference on Artificial Intelligence (AAAI–2000)”, pp. 735–740, https://www.aaai.org/Papers/AAAI/2000/AAAI00-113.pdf [accessed: 13.02.2021].

Document Type

Publication order reference

Identifiers

DOI

10.18778/0208-6018.353.03

Biblioteka Nauki

2033889

YADDA identifier

bwmeta1.element.ojs-doi-10_18778_0208-6018_353_03

Article details

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

Article title

Sentiment Classification of Bank Clients’ Reviews Written in the Polish Language

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Issue

Pages

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier