Combined Machine-Learning Approach to PoS-Tagging of Middle English Corpora

Karimov, Raoul

doi:10.15290/cr.2018.21.2.04

Article details

Journal

Crossroads. A Journal of English Studies

2018 | 21 |

Article title

Combined Machine-Learning Approach to PoS-Tagging of Middle English Corpora

Authors

Karimov, Raoul

Selected contents from this journal

http://repozytorium.uwb.edu.pl/jspui/handle/11320/750

Title variants

Languages of publication

Abstracts

EN

This paper considers the problem of part-of-speech tagging in Middle English corpora (as well as historical corpora in general). Whereas PoS-tagging in general is now considered a solved problem for Modern English and is mainly achieved via hidden Markov models (HMM) and matrix-based word-to-vector conversions with every word in the dictionary being embedded into a single dimension, this approach relies on recurrent syntactic structures and context-free generative grammars and is therefore not applicable to older iterations of the English language due to irregular word order. As such, we believe that Middle English could be better handled by a morphographemic encoding and instance-based machine learning algorithms like SVM, random forests, kNN, etc. Using a moving-average method to generate multidimensional vectors giving a reliable numeric representation of character composition and sequences, we have achieved a precision and recall of 87.5% in classifying Middle English words by their part of speech while using a simplistic combined voting-based binary classifier. This result could be deemed satisfactory and encourages further research in the area.

Keywords

EN

Instance-Based Learning Corpus Middle English PoS-Tagging Moving Average

Publisher

The University of Bialystok

Journal

Crossroads. A Journal of English Studies

Year

2018

Volume

21

Physical description

Dates

published

2018

Contributors

author

Karimov, Raoul

References

Aha, David W., Kibler, Dennis, Albert, Marc K. 1991. Instance-based learning algorithms. Machine Learning 6-1, 37-66.
Beesley, Kenneth R., Karttunen, Lauri. 2004. Finite-State Morphology. Journal of Computational Linguistics 30-2, 237-249.
Breiman, Leo. 2001. Random Forests. https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf (19 April, 2018).
Christianini, Nello, Shawe-Taylor, John. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge: Cambridge University Press.
Frank, Eibe, Witten, Ian H. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Burlington: Morgan Kaufmann.
Ilyish, Boris A. 1968. History of the English Language. Moscow: Vysshaya Shkola.
Jędrzejowicz, Piotr, Strychowski, Jakub A. 2005. Neural Network Based Morphological Analyser of the Natural Language. Intelligent Information Processing and Web Mining. Advances in Soft Computing 31, 199–208.
Jurafsky, Dan, Martin, James H. 2008. Speech and Language Processing. New Jersey: Prentice Hall.
Malouf, Robert. 2016. Generating morphological paradigms with a recurrent neural network. San Diego Linguistic Papers 6, 122–129.
Mayhew, Anthony L, Skeat, Walter.1888. A Concise Dictionary of Middle English From A.D. 1150 to 1580. Oxford: Clarendon Press.
Seyed, Hamid H., Mahdi, Samanipour. 2015. Prediction of Final Concentrate Grade Using Artificial Neural Networks from Gol-E-Gohar Iron Ore Plant. American Journal of Mining and Metallurgy 3-3, 58-62.
Takala, Pyry. 2016. Word Embeddings for Morphologically Rich Languages. Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 177-182.
Teijiro, Isokawa, Naruhiko, Nishimura, Nobuyuki, Matsui. 2012. Quaternionic Multilayer Perceptron with Local Analyticity. Information 3, 756-770.
Web 1 – Helsinki Corpus of English Texts.www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus (4 April, 2018).

Document Type

Publication order reference

Identifiers

URI

http://hdl.handle.net/11320/7504

DOI

10.15290/cr.2018.21.2.04

YADDA identifier

bwmeta1.element.hdl_11320_7504

Article details

Journal

Crossroads. A Journal of English Studies

Article title

Combined Machine-Learning Approach to PoS-Tagging of Middle English Corpora

Authors

Selected contents from this journal

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier