Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2010 | 21 | 17-38

Article title

The Semi-automatic Construction of the Polish Cyc Lexicon

Authors

Content

Title variants

Languages of publication

PL

Abstracts

PL
In this paper we discuss the problem of building the Polish lexicon for the Cyc ontology. As the ontology is very large and complex we describe semi-automatic translation of part of it, which might be useful for tasks lying on the border between the fields of Semantic Web and Natural Language Processing. We concentrate on precise identification of lexemes, which is crucial for tasks such as natural language generation in massively inflected languages like Polish, and we also concentrate on multi-word entries, since in Cyc for every 10 concepts, 9 of them is mapped to expressions containing more than one word. 

Keywords

Year

Volume

21

Pages

17-38

Physical description

Dates

published
2010-06-15

Contributors

  • Computational Linguistics Department, Jagiellonian University, Cracow, Poland

References

  • Amaro, R., Chaves, R.P., Marrafa, P., Mendes, S.: Enriching Wordnets with new Relations and with Event and Argument Structures. In: Seventh International Conference on Intelligent Text Processing and Computational Linguistics. pp. 28 – 40 (2006).
  • Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Ives, Z.: DBpedia: A nucleus for a web of open data. Machine Translation 14(2), 113–157 (2005)
  • Chrza˛szcz, P.: Automatyczne rozpoznawanie i klasyfikacja nazw wielosegmentowych na podstawie analizy haseł encyklopedycznych. Master’s thesis, UST, Cracow (2009)
  • ---
  • Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998).
  • Jurafsky, D., Martin, J.H.: Speech and language processing (second edition). Prentice Hall (2009).
  • Lenat, D.B.: CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM 38(11), 33–38 (1995)
  • Nastase, V., Strube, M., Börschinger, B., Zirn, C., Elghafari, A.: WikiNet: A Very Large Scale Multi-Lingual Concept Network. In: Proceedings of the Seventh conference on International Language Resources and Evaluation, (LREC’10) (2010).
  • Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej (2009)
  • Pisarek, P.: Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu, chap. Słownik fleksyjny, pp. 37–68. Uczelniane Wydawnictwo Naukowo-Dydaktyczne AGH (2009).
  • Pohl, A.: Automatic Construction of the Polish Nominal Lexicon for the OpenCyc Ontology, pp. 51–64. EXIT (2009)
  • Przepiórkowski, A.: The potential of the IPI PAN corpus. Pozna´n Studies in Contemporary Linguistics 41, 31–48 (2006)
  • Sarjant, S., Legg, C., Robinson, M., Medelyan, O.: “All You Can Eat” Ontology-Building: Feeding Wikipedia to Cyc. In: Web Intelligence’09. pp. 341–348 (2009)
  • ---
  • Somers, H.: Review Article: Example-based Machine Translation. Machine Translation 14(2), 113–157 (2005).
  • Suchanek, F., Kasneci, G., Weikum, G.: YAGO: A Large Ontology from Wikipedia and WordNet. Web Semantics: Science, Services and Agents on the World Wide Web 6, 203– 217 (2008).
  • Woli´nski, M.: System znaczników morfosyntaktycznych w korpusie IPI PAN. Polonica XII, 39–54 (2004).
  • Woli´nski, M.: Morfeusz – a Practical Tool for the Morphological Analysis of Polish. In: Intelligent Information Processing and Web Mining, IIS:IIPWM’06 Proceedings. pp. 503 512,. Springer (2006).

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.ojs-doi-10_14746_il_2010_21_2
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.