Results found: 2

Search results

Search:
in the keywords: repróbkowanie

Sort By:

Limit search:

Evaluation of resampling methods in the class unbalance problem

100%

Kubus M.

Econometrics. Ekonometria. Advances in Applied Data Analytics

2020

issue vol. 24, nr 1

39-50

The purpose of many real world applications is the prediction of rare events, and the training sets are then highly unbalanced. In this case, the classifiers are biased towards the correct prediction of the majority class and they misclassify a minority class, whereas rare events are of the greater interest. To handle this problem, numerous techniques were proposed that balance the data or modify the learning algorithms. The goal of this paper is a comparison of simple random balancing methods with more sophisticated resampling methods that appeared in the literature and are available in R program. Additionally, the authors ask whether learning on the original dataset and using a shifted threshold for classification is not more competitive. The authors provide a survey from the perspective of regularized logistic regression and random forests. The results show that combining random under-sampling with random forests has an advantage over other techniques while logistic regression can be competitive in the case of highly unbalanced data.

Celem wielu praktycznych zastosowań modeli dyskryminacyjnych jest przewidywanie zdarzeń rzadkich. Zbiory uczące są wówczas niezbilansowane. W tym przypadku klasyfikatory mają tendencję do poprawnego klasyfikowania obiektów klasy większościowej i jednocześnie błędnie klasyfikują wiele obiektów klasy mniejszościowej, która jest przedmiotem szczególnego zainteresowania. W celu rozwiązania tego problemu zaproponowano wiele technik, które bilansują dane lub modyfikują algorytmy uczące. Celem artykułu jest porównanie prostych, losowych metod bilansowania z bardziej wyrafinowanymi, które pojawiły się w literaturze. Dodatkowo postawiono pytanie, czy konkurencyjnym podejściem nie jest budowa modelu na oryginalnym zbiorze danych i przesunięcie progu klasyfikacji. Badanie przedstawiono z perspektywy regularyzowanej regresji logistycznej i lasów losowych. Wyniki pokazują, że kombinacja metody under-sampling z lasami losowymi wykazuje przewagę nad innymi technikami, podczas gdy regresja logistyczna może być konkurencyjna w przypadku silnego niezbilansowania.

Nieklasyczne procedury testowań wielokrotnych

100%

Denkowska S.

Przegląd Statystyczny

2013

vol. 60

issue 4

461-476

Zakres zastosowań klasycznych procedur testowań wielokrotnych jest ograniczony z powodu założeń modelowych, a w wielu sytuacjach badawczych rozwiązań klasycznych po prostu brak. Kontrolę efektu testowania wielokrotnego umożliwiają wówczas nieklasyczne procedury testowań wielokrotnych. Proste obliczeniowo, o szerokim zakresie zastosowań, brzegowe procedury testowań wielokrotnych nie uwzględniają jednak łącznego rozkładu statystyk testowych, przez co są bardziej konserwatywne od procedur łącznych. Zakres zastosowań procedur łącznych Westfalla i Younga (1993) jest natomiast ograniczony ze względu na wymóg obrotowości podzbioru. Ciekawą alternatywę stanowią dedykowane badaniom genetycznym procedury łączne, zaproponowane przez Dudoit oraz van der Laana (2008). Szeroki zakres zastosowań, możliwość wyboru miary błędu I rodzaju oraz powszechnie dostępne, oprogramowanie (procedura MTP jest zaimplementowana w pakiecie multtest w R), to ich istotne zalety. Niestety, badania nad procedurą MTP przeprowadzone przez Werfta i Bennera (2009) pokazały problemy z kontrolą miary FDR w przypadku bardzo dużej liczby testowanych hipotez i małej liczebności prób. Z kolei zaprezentowany w artykule eksperyment symulacyjny pokazał, że procedura MTP nie zapewnia również kontroli FWER na z góry zadanym poziomie.

The range of applications of classical multiple testing procedures is limited due to model assumptions, and in many cases classic solutions are non-existent. In such situations non-classical multiple testing procedures allow to control the effect of multiple testing. Although they are popular for computational simplicity and a wide range of applications, marginal multiple testing procedures do not take into account joint distribution of test statistics, which make them more conservative than joint multiple testing procedures. The range of applications of joint procedures introduced by Westfall and Young (1993) is limited due to the subset pivotality requirement. Thus, joint multiple testing procedures suggested by Dudoit and van der Laan (2008) seem very promising. A wide range of applications, the possibility of choosing the Type I error rate and easily accessible software (MTP procedure is implemented in R multtest package) are their obvious advantages. Unfortunately, the results of the analysis of MPT procedure obtained by Werft and Benner (2009) revealed that it does not control FDR in case of numerous sets of hypotheses and small samples. Furthermore, the simulation experiment presented in the article showed that MTP procedure does not control FWER, either.

Refine search results

1 Econometrics. Ekonometria. Advances in Applied Data Analytics

1 Przegląd Statystyczny

1 Denkowska S.

1 Kubus M.

1 2020

1 2013

Search results

Evaluation of resampling methods in the class unbalance problem

Nieklasyczne procedury testowań wielokrotnych