PL EN


2014 | 3 | 4 | 261-272
Article title

Adaptive Information Extraction from Structured Text Documents

Content
Title variants
Languages of publication
EN
Abstracts
EN
Effective analysis of structured documents may decide on management information systems performance. In the paper, an adaptive method of information extraction from structured text documents is considered. We assume that documents belong to thematic groups and that required set of information may be determined ”apriori”. The knowledge of document structure allows to indicate blocks, where certain information is more probable to appear. As the result structured data, which can be further analysed are obtained. The proposed solution uses dictionaries and flexion analysis, and may be applied to Polish texts. The presented approach can be used for information extraction from official letters, information sheets and product specifications.
Year
Volume
3
Issue
4
Pages
261-272
Physical description
Dates
published
2014
Contributors
  • Institute of Information Technology, Lodz University of Technology
  • Institute of Information Technology, Lodz University of Technology
References
  • Kosala L., Blockeel H., Bruynooghe M., Van den Bussche J. (2006) Information Extraction from Structured Documents Using k-testable Tree Automaton Inference, Data & Knowledge Engineering 58, 129-158.
  • Kanya N., Ravi T. (2012) Modeling and Techniques in Named Entity Recognition - An Information Extraction Task, Third International Conference on Sustainable Energy and Intelligent Systems, Tamilnadu, India, 27-29 December.
  • Zhu Junwu, Jiang Yi, Xu Yingying (2009) Automatic Knowledge Acquire System Oriented to Web Pages, Proc. of the 3rd International Conference on Intelligent Information Technology Application, 21-22 Nov., Yangzhou University Yangzhou, China, 487-490.
  • Cvitaš A.(2011) Relation Extraction from Text Documents, Proc. of the 34th International Convention MIPRO 2011, May 23-27, Opatija, Croatia, 1565-1570.
  • Fang Luo, Pei Fang, Qizhi Qiu, Han Xiao (2012) Features Induction for Product Named Entity Recognition with CRFs, Proc. of the 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design, 491-496.
  • Xu Qiuyan, Li Fang (2011) Joint Learning of Named Entity Recognition and Relation Extraction, 2011 International Conference on Computer Science and Network Technology, 1978-1982.
  • Cheng Ziguang, Zheng Dequan, Li Sheng (2013) Multi-Pattern Fusion Based Semi-Supervised Name Entity Recognition, Proc. Of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 45-49.
  • Zhu Jianhan (2009) An Adaptive Approach for Web Scale Named Entity Recognition, 1st IEEE Symposium on Web Society 2009, 41-46.
  • Todorović B.T., Rančić S.R., Marković I.M., Mulalić E.H., Ilić V.M. (2008) Named Entity Recognition and Classification using Context Hidden Markov Model, 9th Symposium on Neural Network Applications in Electrical Engineering, September 25-27.
  • Chan Shing-Kit, Lam Wai (2007) Efficient Methods for Biomedical Named Entity Recognition, Proc. of the 7th IEEE International Conference on Bioinformatics & Bioengineering, Boston MA, October 14-17, 729-735.
  • Liao Zhihua, Wu Hongguang (2012) Biomedical Named Entity Recognition based on Skip-Chain CRFS, 2012 International Conference on Industrial Control and Electronics Engineering, 1495-1498.
  • Keretna S., Lim Ch. P., Creighton D. (2014) A Hybrid Model for Named Entity Recognition Using Unstructured Medical Text, Proc. of the 2014 9th International Conference on System of Systems Engineering, Adelaide Australia, June 9-13, 85-90.
  • Debole F., Sebastiani F. (2005) An analysis of the relative hardness of reuters-21578 subsets, J. Am. Soc. Inf. Sci. Technol., 56/2005, 584-596.
  • Sukanya M., Biruntha S. (2012) Techniques on text mining, Proc. of the IEEE Int. Conference on Advanced Communication Control and Computing Technologies, 269-271.
  • Ożdżyński P. (2014) Text document categorization based on word frequent sequence mining, Information Systems Architecture and Technology, Contemporary Approaches to Design and Evaluation of Information Systems, Oficyna Wydawnicza Politechniki Wrocławskiej, 129-138.
Document Type
Publication order reference
Identifiers
ISSN
2084-5537
YADDA identifier
bwmeta1.element.desklight-1f100ad4-f713-493e-a4c1-c3fe8d559f17
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.