Adaptive Information Extraction from Structured Text Documents

Ożdżyński, Piotr; Zakrzewska, Danuta

Article details

Journal

Information Systems in Management

2014 | 3 | 4 | 261-272

Article title

Adaptive Information Extraction from Structured Text Documents

Authors

Ożdżyński Piotr , Zakrzewska Danuta

Content

Full texts:

Download

Title variants

Languages of publication

EN

Abstracts

EN

Effective analysis of structured documents may decide on management information systems performance. In the paper, an adaptive method of information extraction from structured text documents is considered. We assume that documents belong to thematic groups and that required set of information may be determined ”apriori”. The knowledge of document structure allows to indicate blocks, where certain information is more probable to appear. As the result structured data, which can be further analysed are obtained. The proposed solution uses dictionaries and flexion analysis, and may be applied to Polish texts. The presented approach can be used for information extraction from official letters, information sheets and product specifications.

Keywords

EN

Natural Language Processing Information Extraction Tagging Named Entity Recognition

Publisher

Katedra Informatyki Szkoła Główna Gospodarstwa Wiejskiego w Warszawie

Journal

Information Systems in Management

Year

2014

Volume

3

Issue

4

Pages

261-272

Physical description

Dates

published

2014

Contributors

author

Ożdżyński Piotr

Institute of Information Technology, Lodz University of Technology

author

Zakrzewska Danuta

Institute of Information Technology, Lodz University of Technology

References

Kosala L., Blockeel H., Bruynooghe M., Van den Bussche J. (2006) Information Extraction from Structured Documents Using k-testable Tree Automaton Inference, Data & Knowledge Engineering 58, 129-158.
Kanya N., Ravi T. (2012) Modeling and Techniques in Named Entity Recognition - An Information Extraction Task, Third International Conference on Sustainable Energy and Intelligent Systems, Tamilnadu, India, 27-29 December.
Zhu Junwu, Jiang Yi, Xu Yingying (2009) Automatic Knowledge Acquire System Oriented to Web Pages, Proc. of the 3rd International Conference on Intelligent Information Technology Application, 21-22 Nov., Yangzhou University Yangzhou, China, 487-490.
Cvitaš A.(2011) Relation Extraction from Text Documents, Proc. of the 34th International Convention MIPRO 2011, May 23-27, Opatija, Croatia, 1565-1570.
Fang Luo, Pei Fang, Qizhi Qiu, Han Xiao (2012) Features Induction for Product Named Entity Recognition with CRFs, Proc. of the 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design, 491-496.
Xu Qiuyan, Li Fang (2011) Joint Learning of Named Entity Recognition and Relation Extraction, 2011 International Conference on Computer Science and Network Technology, 1978-1982.
Cheng Ziguang, Zheng Dequan, Li Sheng (2013) Multi-Pattern Fusion Based Semi-Supervised Name Entity Recognition, Proc. Of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 45-49.
Zhu Jianhan (2009) An Adaptive Approach for Web Scale Named Entity Recognition, 1st IEEE Symposium on Web Society 2009, 41-46.
Todorović B.T., Rančić S.R., Marković I.M., Mulalić E.H., Ilić V.M. (2008) Named Entity Recognition and Classification using Context Hidden Markov Model, 9th Symposium on Neural Network Applications in Electrical Engineering, September 25-27.
Chan Shing-Kit, Lam Wai (2007) Efficient Methods for Biomedical Named Entity Recognition, Proc. of the 7th IEEE International Conference on Bioinformatics & Bioengineering, Boston MA, October 14-17, 729-735.
Liao Zhihua, Wu Hongguang (2012) Biomedical Named Entity Recognition based on Skip-Chain CRFS, 2012 International Conference on Industrial Control and Electronics Engineering, 1495-1498.
Keretna S., Lim Ch. P., Creighton D. (2014) A Hybrid Model for Named Entity Recognition Using Unstructured Medical Text, Proc. of the 2014 9th International Conference on System of Systems Engineering, Adelaide Australia, June 9-13, 85-90.
Debole F., Sebastiani F. (2005) An analysis of the relative hardness of reuters-21578 subsets, J. Am. Soc. Inf. Sci. Technol., 56/2005, 584-596.
Sukanya M., Biruntha S. (2012) Techniques on text mining, Proc. of the IEEE Int. Conference on Advanced Communication Control and Computing Technologies, 269-271.
Ożdżyński P. (2014) Text document categorization based on word frequent sequence mining, Information Systems Architecture and Technology, Contemporary Approaches to Design and Evaluation of Information Systems, Oficyna Wydawnicza Politechniki Wrocławskiej, 129-138.

Document Type

Publication order reference

Identifiers

ISSN

2084-5537

YADDA identifier

bwmeta1.element.desklight-1f100ad4-f713-493e-a4c1-c3fe8d559f17

Article details

Journal

Information Systems in Management

Article title

Adaptive Information Extraction from Structured Text Documents

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Issue

Pages

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier