This paper describes a mechanism for the extraction of relevant information about people from Polish portals for professionals. The method of information extraction is based on hierarchical execution of XPath commands and regular expressions depending on the structure of processed documents. The extraction component EXT is a part of the eXtraSpec system, which task is to support Human Resources departments of Polish companies during recruitment and team building. EXT is able to deal with several sources of information and with user profiles that are acquired from professionals' portals. In this article we also discuss the advantages of the chosen extraction method in the context of the goals of the whole eXtraSpec system and we show the directions of future research.
Tomasz Kaczmarek, Poznan University of Economics, Faculty of Informatics and Electronic Economy, Department of Information Systems, Poznan, Poland
Publication order reference
CEJSH db identifier