Analiza porównawcza wybranych metod szacowania błędu predykcji klasyfikatora

Herman, Sergiusz

doi:10.5604/01.3001.0014.1225

Article details

Journal

Przegląd Statystyczny

2016 | 63 | 4 | 449-463

Article title

Analiza porównawcza wybranych metod szacowania błędu predykcji klasyfikatora

Authors

Sergiusz Herman

Content

Full texts:

10-35db3aa3-3af4-409a-b83e-12f11daa0e25.pdf.pdf

Download

Title variants

EN

Comparative Analysis of Selected Methods for Estimating the Prediction Error of Classifier

Languages of publication

Abstracts

EN

Classification is an algorithm, which assigns studied companies, taking into consideration their attributes, to specific population. An essential part of it is classifier. Its measure of quality is especially predictability, measured by true error rate. The value of this error, due to lack of sufficiently large and independent test set, must be estimated on the basis of available learning set. The aim of this article is to make a review and compare selected methods for estimating the prediction error of classifier, constructed with linear discriminant analysis. It was examined if the results of the analysis depends on the sample size and the method of selecting variables for a model. Empirical research was made on example of problem of bankruptcy prediction of join-stock companies in Poland.

PL

Klasyfikacją nazywamy algorytm postępowania, który przydziela badane obserwacje/obiekty, bazując na ich cechach do określonych populacji. W tym celu konstruowany jest odpowiedni model – klasyfikator. Miarą jego jakości jest przede wszystkim zdolność predykcyjna, mierzona m.in. za pomocą prawdziwego błędu predykcji. Wartość tego błędu, ze względu na brak odpowiednio dużej, niezależnej próby testowej, musi być często szacowana na podstawie dostępnej próby uczącej. Celem artykułu jest dokonanie przeglądu oraz empirycznej analizy porównawczej wybranych metod szacowania błędu predykcji klasyfikatora, skonstruowanego z wykorzystaniem liniowej analizy dyskryminacyjnej. Zbadano, czy wyniki analizy uzależnione są od wielkości próby oraz metody wyboru zmiennych do modelu. Badanie empiryczne zostało przeprowadzone na przykładzie problemu prognozowania upadłości spółek akcyjnych w Polsce.

Keywords

PL

błąd predykcji walidacja krzyżowa prosta metoda podziału wielokrotne repróbkowanie upadłość przedsiębiorstw klasyfikacja

EN

prediction error cross-validation holdout method bootstrapping corporate bankruptcy classification

Publisher

Główny Urząd Statystyczny

Journal

Przegląd Statystyczny

Year

2016

Volume

63

Issue

4

Pages

449-463

Physical description

Dates

published

2016

Contributors

author

Sergiusz Herman

Uniwersytet Ekonomiczny w Poznaniu, Wydział Informatyki i Gospodarki Elektronicznej, Katedra Ekonometrii

References

Braga-Neto U. M., Dougherty E. R., (2004), Is Cross-validation for Small-sample Microarray Classification?, Bioinformatics, 20 (3), 374–380.
Efron B., (1983), Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation, Journal of the American Statistical Association, 78 (382), 316–331.
Efron B., Tibshirani R. J., (1997), Improvements on Cross-Validation: The .632+ Bootstrap Method, Journal of the American Statistical Association, 92 (438), 548–560.
Gatnar E., (2001), Nieparametryczna metoda dyskryminacji i regresji, Wydawnictwo Naukowe PWN, Warszawa.
Gatnar E., (2008), Podejście wielomodelowe w zagadnieniach dyskryminacji i regresji, Wydawnictwo Naukowe PWN, Warszawa.
Geisser S., (1975), The Predictive Sample Reuse Method With Applications, Journal of the American Statistical Association, 70, 320–328.
Hadasik D., (1998), Upadłość przedsiębiorstw w Polsce i metody jej prognozowania, Zeszyty naukowe – seria II, Prace habilitacyjne, Zeszyt 153, Akademia Ekonomiczna w Poznaniu, Poznań.
Hanczar B., Dougherty E. R., (2013), The Reliability of Estimated Confidence Intervals for Classification Error Rates When Only a Single Sample is Available, Pattern Recognition, 46, 1067–1077.
Hand D. J., (1981), Discrimination and Classification, John Wiley & Sons, Chichester.
Isaksson A., Wallman M., Goransson H., Gustafsson M. G., (2008), Cross-Validation and Bootstrapping are Unreliable in Small Sample Classification, Pattern Recognition, 29, 1960–1965.
Jiang W., Simon R., (2007), A Comparison of Bootstrap Methods and an Adjusted Bootstrap Approach for Estimating Prediction Error in Microarray Classification, Statistics in Medicine, 26, 5320–5334.
Kim J. H., (2009), Estimating Classification Error Rate: Repeated Cross-Validation, Repeated Hold-Out and Bootstrap, Computational Statistics and Data Analysis, 53, 3735–3745.
Lachenbruch P. A., Mickey M. R., (1968), Estimation of Error Rates in Discriminant Analysis, Technometrics, 10, 1–11.
McLachlan G. J., (1992), Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons, Inc.
Molinaro A. M., Simon R., Pfeiffer R. M., (2005), Prediction Error Estimation: A Comparison of Resampling Methods, Bioinformatics, 21, 3301–3307.
Ripley B. D., (1996), Pattern Recognition and Neural Networks, Cambrige University Press.
Simon R., Radmacher M. D., Dobbin K., McShane L. M., (2003), Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification, Journal of the National Cancer Institute, 95 (1), 14–18.
Wehberg S., Schumacher M., (2004), A Comparison of Nonparametric Error Rate Estimation Methods in Classification Problems, Biometrical Journal, 46, 35–47.

Document Type

Publication order reference

Identifiers

DOI

10.5604/01.3001.0014.1225

Biblioteka Nauki

1050557

YADDA identifier

bwmeta1.element.ojs-doi-10_5604_01_3001_0014_1225

Article details

Journal

Przegląd Statystyczny

Article title

Analiza porównawcza wybranych metod szacowania błędu predykcji klasyfikatora

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Issue

Pages

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier