Wykorzystanie wskaźnika Dunna do ustalania optymalnej liczby skupień

Migdał-Najman, Kamila; Najman, Krzysztof

Article details

Journal

Wiadomości Statystyczne. The Polish Statistician

2008 | nr 11 | 26-35

Article title

Wykorzystanie wskaźnika Dunna do ustalania optymalnej liczby skupień

Authors

Kamila Migdał-Najman , Krzysztof Najman

Title variants

Use of Dunn's index to determine the optimal cluster number

Languages of publication

PL

Abstracts

W opracowaniu przedstawiono metodę, która daje możliwość oceny jakości grupowania. W analizie skupień dąży się do takiego wyodrębnienia skupień obiektów aby były one wewnętrznie minimalnie, a zewnętrznie maksymalnie zróżnicowane. Jeżeli uzyskany zostaje taki podział mówi się, że jest on „wysokiej jakości”. Ta wysoka jakość interpretowana jest także w kontekście wybranej liczby skupień. Jeżeli podział jest „wysokiej jakości” to oznacza, że liczba skupień została prawidłowo ustalona. Jednocześnie jednym z krytycznych parametrów wymaganych w wielu klasycznych metodach grupowania jest ustalenie liczby skupień, na jaką należy dany zbiór obiektów podzielić. W artykule opisano jedną z nieskomplikowanych metod, pozwalającą na ocenę jakości grupowania, a tym samym ustalanie liczby skupień w zbiorze danych. Przedstawiono też podstawy teoretyczne indeksu Dunna. Weryfikację jego własności przeprowadzono na podstawie eksperymentu symulacyjnego. Podano także syntetyczne wyniki badań symulacyjnych i wnioski z przeprowadzonej analizy. (abstrakt oryginalny)

EN

In the article the method that gives the possibility to assess the grouping quality has been presented. The cluster analysis is aiming in such separation of objects aggregation that they are diversified minimal internally and maximal externally. Acquisition of such partition is called "high quality ". Those high quality is interpreted also in the context of chosen number of clusters. If the partition is "high quality", it means that the number of clusters was correctly determined. At the same time, one of the critical parameters required in many classical grouping methods is to determine the number of cluster on which given objects set shall be divided. In the article one of the uncomplicated methods allowing to asses the grouping quality, thereby determining the number of clusters in the data set was described. The theoretical bases of Dunn 's index were presented. Verification of its features was carried out on the base of simulation experiment. The synthetic results of simulation surveys and conclusion from realized analysis were also given. (original abstract)

Keywords

PL

Analiza skupień Analiza statystyczna Algorytmy

EN

Cluster analysis Statistical analysis Algorithms

Publisher

Główny Urząd Statystyczny

Journal

Wiadomości Statystyczne. The Polish Statistician

Year

2008

Issue

nr 11

Pages

26-35

Physical description

Contributors

author

Kamila Migdał-Najman

author

Krzysztof Najman

References

Ball G., Hall D. J. (1965), ISODATA, A novel method of data analysis and pattern classification, Menlo Park: Stanford Research Institute
Bolshakova A., Azuaje F. (2003), Cluster validation techniques for genome expression data. Signal Processing 83
Calinski R. B., Harabasz J. (1974),,A dendrite method for cluster analysis, "Communications in Statistics", No 3
Davies D. L., Bouldin D. W. (1979), A cluster separation measure, "IEEE Transactions on Pattern Analysis and Machine Intelligence", No 1
Dunn J. (1974), Well separated clusters and optimal fuzzy partitions, J. Cybernet, No 4
Friedman H. P., Rubin J. (1967), On some invariant criteria for grouping data, "Journal of the American Statistical Association", No 62
Hartigan J. A. (1975), Clustering Algorithms, New York, Wiley
Kaufman L., Rousseeuw P. J. (1990), Finding Groups in Data, A Wiley-Interscience Publication, John Wiley & Sons, Inc.
Maimon O., Rokach L. (2005), Data mining and knowledge discovery handbook, Springer
Migdał-Najman K., Najman K. (2005), Analityczne metody ustalania liczby skupień, "Prace Naukowe Akademii Ekonomicznej we Wrocławiu", Nr 1076, Taksonomia 12; Klasyfikacja i analiza danych - teoria i zastosowania, Wrocław
Migdał-Najman K., Najman K. (2006), Analityczne metody ustalania liczby skupień w rozmytych zbiorach danych, "Prace Naukowe Akademii Ekonomicznej we Wrocławiu", Taksonomia 13: Klasyfikacja i analiza danych - teoria i zastosowania, Wrocław
Scott A. J., Symons M. J. (1971), Clustering methods based on likelihood ratio criteria, "Biometrics", No 27

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.ekon-element-000153940838

Article details

Journal

Wiadomości Statystyczne. The Polish Statistician

Article title

Wykorzystanie wskaźnika Dunna do ustalania optymalnej liczby skupień

Authors

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Issue

Pages

Physical description

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier