Modification of Hinov Method of Variable Selection for Multiple Cluster Structure Analysis

Korzeniewski, Jerzy

Article details

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

2013 | 286 |

Article title

Modification of Hinov Method of Variable Selection for Multiple Cluster Structure Analysis

Authors

Korzeniewski, Jerzy

Content

Full texts:

Download

Title variants

EN

Modyfikacja metody HINoV selekcji zmiennych w analizie wielokrotnych struktur skupień

Languages of publication

Abstracts

EN

The original HINoV method (Carmone et al., 1999 ) is not robust to the presence of correlated unimodal and uniform variables among noisy variables (e.g. Korzeniewski, 2012). Moreover, HINoV can be applied only to a single cluster structure analysis. In the article, a modification is proposed consisting in grouping all variables (separately for each reference variable) into two classes. One of the classes consists of variables similar to the reference variable, the other consists of variables which are “less similar”. Similarity between two variables is based on the similarity of the data set division into an established number of clusters (from 2 to 10) measured with the modified Rand index. We arrive at a zero-one matrix describing relations between every pair of variables. Then, a set of variables creating the same (the strongest) cluster structure is selected by means of a criterion optimizing the matrix division into four blocks. After completing the first stage selection one can search another cluster structure applying the same procedure to the set of remaining variables. The modification is assessed in a broad experiment based on 2250 data sets generated from the mixtures of normal distribution.

PL

Oryginalna metoda HINoV jest zupełnie nieodporna na występowanie wśród zmiennych zanieczyszczających strukturę skupień zmiennych skorelowanych jednomodalnych lub równomiernych. Ponadto HINoV można stosować tylko w przypadku jednej struktury skupień.W referacie zaproponowana jest modyfikacja polegająca na tym, by, oddzielnie, dla każdej ustalonej zmiennej, grupować zmienne w dwie klasy zmiennych podobnych i niepodobnych do niej w sensie podobieństwa podziału zbioru danych na daną liczbę skupień (od 2 do 10). Otrzymujemy wówczas macierz zerojedynkową opisującą związki pomiędzy każdą parą zmiennych. Następnie, podzbiór zmiennych tworzących tę samą (najsilniejszą) strukturę skupień wybierany jest za pomocą kryterium optymalizującego podział macierzy na cztery bloki. Po wybraniu zmiennych tworzących jedną strukturę skupień można, w dalszym kroku, wybierać zmienne tworzące następną strukturę skupień spośród zmiennych, które nie zostały wybrane w pierwszym kroku. W celu selekcji właściwego bloku macierzy stosowane jest kryterium stabilności podziału zbioru danych oparte na wielokrotnym losowaniu połowy zbioru i porównywaniu podziałów otrzymanych przy pomocy metody k-średnich. Modyfikacja oceniona jest w obszernym eksperymencie symulacyjnym na 2250 zbiorach danych wygenerowanych w postaci mieszanin rozkładów normalnych.

Keywords

EN

cluster analysis variable choice multiple cluster structures

Publisher

Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

Year

2013

Volume

286

Physical description

Dates

published

2013

Contributors

author

Korzeniewski, Jerzy

other

University of Lodz, Department of Statistical Methods

References

Document Type

Publication order reference

Identifiers

URI

http://hdl.handle.net/11089/10324

YADDA identifier

bwmeta1.element.hdl_11089_10324

Article details

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

Article title

Modification of Hinov Method of Variable Selection for Multiple Cluster Structure Analysis

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier