Heteroscedastic Discriminant Analysis Combined with Feature Selection for Credit Scoring

Stąpor, Katarzyna; Smolarczyk, Tomasz; Fabian, Piotr

Article details

Journal

Statistics in Transition new series

2016 | 17 | 2 | 265-280

Article title

Heteroscedastic Discriminant Analysis Combined with Feature Selection for Credit Scoring

Authors

Stąpor Katarzyna , Smolarczyk Tomasz , Fabian Piotr

Content

Full texts:

Heteroscedastic Discriminant Analysis Combined with Feature Selection for Credit Scoring

Download

Title variants

Languages of publication

EN

Abstracts

EN

Credit granting is a fundamental question and one of the most complex tasks that every credit institution is faced with. Typically, credit scoring databases are often large and characterized by redundant and irrelevant features. An effective classification model will objectively help managers instead of intuitive experience. This study proposes an approach for building a credit scoring model based on the combination of heteroscedastic extension (Loog, Duin, 2002) of classical Fisher Linear Discriminant Analysis (Fisher, 1936, Krzyśko, 1990) and a feature selection algorithm that retains sufficient information for classification purpose. We have tested five feature subset selection algorithms: two filters and three wrappers. To evaluate the accuracy of the proposed credit scoring model and to compare it with the existing approaches we have used the German credit data set from the study (Chen, Li, 2010). The results of our study suggest that the proposed hybrid approach is an effective and promising method for building credit scoring models.

Keywords

EN

heteroscedastic discriminant analysis feature subset selection variable importance credit scoring model

Publisher

Główny Urząd Statystyczny

Journal

Statistics in Transition new series

Year

2016

Volume

17

Issue

2

Pages

265-280

Physical description

Contributors

author

Stąpor Katarzyna

katarzyna.stapor@polsl.pl

Institute of Computer Science, Silesian University of Technology

author

Smolarczyk Tomasz

tomasmo356@student.polsl.pl

Institute of Computer Science, Silesian University of Technology

author

Fabian Piotr

piotr.fabian@polsl.pl

Institute of Computer Science, Silesian University of Technology

References

CHEN, F., LI, F., (2010). Combination of feature selection approaches with SVM in credit scoring, Expert Systems with Applications, Vol. 37, pp. 4902–4909.
COVER, T., THOMAS, J., (1991). Elements of information theory. John Wiley & Sons, New York, NY.
CROOK, J. N., EDELMAN, D. B., THOMAS, L. C., (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research 183 (3), pp. 1447–1465.
DASH, M., LIU, H., (1997). Feature selection for classification. Intelligent Data Analysis, 1, pp. 131–156.
DUDA, R., HART, P., STORK, D., (2001). Pattern Classification. John Wiley & Sons, New York, 2 ed.
FEO, T. A., RESENDE, M. G. C., (1995). Greedy randomized adaptive search procedures. J. Global Optim. 2, pp. 1–27.
FISHER, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, pp. 179–188.
FUKUNAGA, K., (1990). Introduction to statistical pattern recognition. New York: Academic Press.
GOLDBERG, D., (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley Professional.
HALL, M., SMITH, L., (1997). Feature subset selection: a correlation based filter approach, in International Conference on Neural Information Processing and Intelligent Information Systems, Berlin.
KRZYŚKO, M., (1990). Discriminant analysis, WNT, Warszawa (in Polish).
KRZYŚKO, M., WOŁYŃSKI, W., (1996). Discriminant rules based on distances, Tatra Mountains Math. Publ. 7(1996), pp. 289–296.
LOOG, M., DUIN, R., (2002). Non-iterative heteroscedastic linear dimension reduction for two-class data: from Fisher to Chernoff. Proc. 4th Int. Workshop S+SSPR, pp. 508–517.
MATUSZCZYK, A., (2012). Credit scoring. Warszawa: CeDeWu Sp. z o.o.
MOSCATO, P., (2002). Memetic algorithms. In Pardalos, P.M., Resende, M. (eds.): Handbook of Applied Optimization. Oxford: Oxford University Press, pp. 157–167.
PACHECO, J., et al., (2006). Analysis of new variable selection methods in discriminant analysis, Computational Statistics & Data Analysis, Vol. 51, 3, pp. 1463–1478.
PUDIL, P., et al., (1994). Floating search methods in feature selection, Pattern Recognition Letters, Vol. 15, 11, pp. 1119–1125.
SOMOL, P., et al., (2005). Filter- versus Wrapper-based Feature Selection For Credit Scoring, International Journal of Intelligent Systems, Vol. 20 (10), pp. 985–999.
SPENCE, C., SAJDA, P., (1998). Role of feature selection in building pattern recognizers for computer-aided diagnosis, in Medical Imaging 1998: Image Processing, San Diego.
STĄPOR, K., (2011). Classification methods in computer vision. PWN, Warszawa (in Polish).
STĄPOR, K., (2015) Better alternatives for stepwise discriminant analysis. Acta Universitatis Lodziensis, Folia Oeconomica, Multivariate Statistical Analysis in Theory and Practice, nr 1(311), Lodziensis University Press, pp. 9–15.
THOMAS, L. C., (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting 16 (2), pp. 149–172.
THOMAS, L. C., OLIVER, R. W., HAND, D. J., (2005). A survey of the issues in consumer credit modelling research. Journal of the Operational Research Society 56 (9), pp. 1006–1015.
ZHANG, D., X., ZHOU, S., LEUNG, C. H., ZHENG, J., (2010). Vertical bagging decision trees model for credit scoring. Expert Systems with Applications 37 (12), pp. 7838–7843.

Article details

Journal

Statistics in Transition new series

Article title

Heteroscedastic Discriminant Analysis Combined with Feature Selection for Credit Scoring

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Issue

Pages

Physical description

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier