INFORMATION EXTRACTION FROM WEB PAGES FOR THE NEEDS OF EXPERT FINDING

Kaczmarek, Tomasz; Zyskowski, Dominik; Walczak, Adam; Abramowicz, Witold

Article details

Journal

Studies in Logic, Grammar and Rhetoric

2010 | 22(35) | 141-157

Article title

INFORMATION EXTRACTION FROM WEB PAGES FOR THE NEEDS OF EXPERT FINDING

Authors

Kaczmarek Tomasz , Zyskowski Dominik , Walczak Adam , Abramowicz Witold

Title variants

Languages of publication

EN

Abstracts

EN

This paper describes a mechanism for the extraction of relevant information about people from Polish portals for professionals. The method of information extraction is based on hierarchical execution of XPath commands and regular expressions depending on the structure of processed documents. The extraction component EXT is a part of the eXtraSpec system, which task is to support Human Resources departments of Polish companies during recruitment and team building. EXT is able to deal with several sources of information and with user profiles that are acquired from professionals' portals. In this article we also discuss the advantages of the chosen extraction method in the context of the goals of the whole eXtraSpec system and we show the directions of future research.

Keywords

EN

EXTRASPEC HIERARCHICAL ALGORITHM POLISH LANGUAGE WEB INFORMATION EXTRACTION XPATH

Discipline

LIBRARY_&_INFORMATION_SCIENCE: LIBRARY & INFORMATION SCIENCE

Publisher

Sciendo

Journal

Studies in Logic, Grammar and Rhetoric

Year

2010

Issue

22(35)

Pages

141-157

Physical description

Document type

ARTICLE

Contributors

author

Kaczmarek Tomasz

author

Zyskowski Dominik

author

Walczak Adam

author

Abramowicz Witold

Tomasz Kaczmarek, Poznan University of Economics, Faculty of Informatics and Electronic Economy, Department of Information Systems, Poznan, Poland

Article details

Journal

Studies in Logic, Grammar and Rhetoric

Article title

INFORMATION EXTRACTION FROM WEB PAGES FOR THE NEEDS OF EXPERT FINDING

Authors

Title variants

Languages of publication

Abstracts

Keywords

Discipline

Publisher

Journal

Year

Issue

Pages

Physical description

Document type

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier