Auromatyczne rozpoznawianie ofert kupna, sprzedaży i zamiany w tekstach w języku polskim

Małyszko, Jacek; Bukowska, Elżbieta; Filipowska, Agata; Perkowski, Bartosz; Stolarski, Piotr; Wieloch, Karol

Article details

Journal

Studia Oeconomica Posnaniensia

2013 | 1 | 5(254) |

Article title

Auromatyczne rozpoznawianie ofert kupna, sprzedaży i zamiany w tekstach w języku polskim

Authors

Małyszko Jacek , Bukowska Elżbieta , Filipowska Agata , Perkowski Bartosz , Stolarski Piotr , Wieloch Karol

Content

Full texts:

małyszko_bukowska_filipowska_perkowski_stolarski_wieloch.pdf

Download

Title variants

EN

Automatic identification of buy, sell and exchange offers in unstructured texts written in the Polish language

Languages of publication

Abstracts

PL

Artykuł prezentuje wyniki prac i eksperymentów dotyczących problemu przetwarzania niestrukturyzowanych tekstów napisanych w języku polskim w celu identyfikacji w nich ofert kupna, sprzedaży lub wymiany. W badaniach wykorzystano reguły ekstrakcji przygotowane na podstawie przeprowadzonej analizy korpusu. W artykule omówione są wybrane przykłady reprezentujące trudności, jakie niesie ze sobą omawiany problem. Opracowane podejście zostało poddane eksperymentalnej ocenie, na której podstawie skuteczność identyfikacji ofert została określona na 83% (według miary F1), natomiast określanie typu oferty (czy jest to kupno, czy sprzedaż) działa poprawnie w ponad 95% przypadków.

EN

This article presents the results of research and experimentation on processing unstructured texts written in the Polish language in order to identify which of these texts contain buy, sell or exchange offers. The approach applied was based on manually prepared rules of extraction based on an analysis of a corpus of documents obtained from the Internet (within the Semantic Monitoring of Cyberspace project). In the article, selected examples of text fragments are discussed which show what challenges had to be addressed to solve the problem. The chosen approach was then experimentally evaluated; the accuracy in identifying offers reaching 83% (according to the F1-score), while determining the offer type (whether buying or selling) was correct in over 95% of cases.

Keywords

PL

przetwarzanie języka naturalnego ekstrakcja informacji

EN

industrial organization industry studies: services information and internet services computer sofrware

Publisher

Wydawnictwo Uniwersytetu Ekonomicznego w Poznaniu

Journal

Studia Oeconomica Posnaniensia

Year

2013

Volume

1

Issue

5(254)

Physical description

Contributors

author

Małyszko Jacek

Uniwersytet Ekonomiczny w Poznaniu

author

Bukowska Elżbieta

Uniwersytet Ekonomiczny w Poznaniu

author

Filipowska Agata

Uniwersytet Ekonomiczny w Poznaniu

author

Perkowski Bartosz

Uniwersytet Ekonomiczny w Poznaniu

author

Stolarski Piotr

Uniwersytet Ekonomiczny w Poznaniu

author

Wieloch Karol

References

Berners-Lee, T., Hendler, J., Lassila, O. i in., 2001, The Semantic Web. Scientific American, 284(5), s. 28–37.
Frank, E. Bouckaert, R. (2006), Naive Bayes for Text Classification with Unbalanced Classes, Knowledge Discovery in Databases: PKDD 2006, s. 503–510.
Joachims, T., 1998, Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Machine learning: ECML-98, s. 137–142.
Mykowiecka, A., Marciniak, M., Kupść, A., 2009, Rule-based Information Extraction from Patients’ Clinical Data, Journal of biomedical informatics, vol. 42(5), s. 923–936.
Pham, L.V., Pham, S.B., 2012, Information Extraction for Vietnamese Real Estate Advertisements, Fourth International Conference on Knowledge and Systems Engineering (KSE), s. 181–186.
Sebastiani, F., 2002, Machine Learning in Automated Text Categorization, ACM Comput. Surv., vol. 34(1), s. 1–47.
Soderland, S., 1999, Learning Information Extraction Rules from Semi-structured and Free Text, Machine Learning, vol. 34(1–3), s. 233–272.
Vlas, R.E., Robinson, W.N, 2012, Two Rule-based Natural Language Strategies for Requirements Discovery and Classification in Open Source Software Development Projects, Journal of Management Information Systems, vol. 28(4), s. 11–38.
Wawer, A. (2011), Mining Opinion Attributes From Texts Using Multiple Kernel Learning, IEEE 11th International Conference on Data Mining Workshops.
Wilson, T., Wiebe, J, Hoffmann, P., 2009, Recognizing Contextual Polarity: An Exploration of Features for Phrase-level Sentiment Analysis, Computational linguistics, vol. 35(3), s. 399–433.
Zhang, C., Zhang, X, Jiang, W., Shen, Q., Zhang, S., 2009, Rule-based Extraction of Spatial Relations in Natural Language Text, International Conference on Computational Intelligence and Software Engineering, s. 1–4

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.desklight-77733c95-9229-4ff6-a37b-875cdf441d8b

Article details

Journal

Studia Oeconomica Posnaniensia

Article title

Auromatyczne rozpoznawianie ofert kupna, sprzedaży i zamiany w tekstach w języku polskim

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Issue

Physical description

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier