A SEARCH OF SIGNIFICANT PHRASES FOR BUILDING TOPIC MODELS IN TEXT DOCUMENTS

Ożdżyński, Piotr; Zakrzewska, Danuta

Article details

Journal

Information Systems in Management

2016 | 5 | 2 | 205-214

Article title

A SEARCH OF SIGNIFICANT PHRASES FOR BUILDING TOPIC MODELS IN TEXT DOCUMENTS

Authors

Ożdżyński Piotr , Zakrzewska Danuta

Content

Full texts:

Download

Title variants

Languages of publication

EN

Abstracts

EN

A huge amount of documents in the digitalized libraries requires efficient methods for exploring contained there information. “Topic modeling” is considered as one of the most effective among them. In spite of commonly used approaches for finding occurrences of single words, in the paper building topic models based on phrases is pondered. We propose a methodology, which enables to create a set of significant word sequences and thus limiting the search area to phrases which contain them. The methodology is evaluated on experiments performed on real text datasets. Obtained results are compared with those received by using LDA algorithm.

Keywords

EN

topic model frequent sequences LDA

Publisher

Katedra Informatyki Szkoła Główna Gospodarstwa Wiejskiego w Warszawie

Journal

Information Systems in Management

Year

2016

Volume

5

Issue

2

Pages

205-214

Physical description

Dates

published

2016

Contributors

author

Ożdżyński Piotr

Institute of Information Technology, Lodz University of Technology

author

Zakrzewska Danuta

Institute of Information Technology, Lodz University of Technology

References

Papadimitriou C., Raghavan P., Tamaki H.; Vempala S. (2000) Latent Semantic Indexing: A probabilistic analysis, Journal of Computer and System Sciences, Vol. 61 (2), 217–235
Blei D., Ng A, Jordan M. (2003) Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993–1022
Blei D. (2012) Probabilistic topic models, Communications of the ACM, 55 (4), 77–84
Danilevsky M., Wang C., Desai N.,, Ren X., Guo J., Han J. (2014) Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents, SDM’14
Han J., Pei J., Yin Y., Mao R. (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Min. Knowl. Discov., 8 (1), 53–87
El-Kishky A., Song Y., Wang C., Voss C., Han J. (2014) Scalable Topical Phrase Mining from Text Corpora, Proceedings of the VLDB Endowment, Vol. 8 (3), 305-316
Agrawal R., Srikant R. (1995) Fast algorithms for mining association rules in large databases, In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB ’94, pages 487–499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.
Machine Learning for Language Toolkit http://mallet.cs.umass.edu/
Hamming R.W. (1950) Error detecting and error correcting codes, The Bell System Technical Journal, Vol. 29 (2)
ftp://medir.ohsu.edu/pub/ohsumed
http://www.ai.mit.edu/people/jrennie/20Newsgroups/

Document Type

Publication order reference

Identifiers

ISSN

2084-5537

YADDA identifier

bwmeta1.element.desklight-4ee7375c-ff34-4e9a-a20f-6b87be5c329b

Article details

Journal

Information Systems in Management

Article title

A SEARCH OF SIGNIFICANT PHRASES FOR BUILDING TOPIC MODELS IN TEXT DOCUMENTS

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Issue

Pages

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier