Classification of Large Data Sets. Comparison of Performance of Chosen Algorithms

Dudek, Andrzej

Article details

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

2013 | 285 |

Article title

Classification of Large Data Sets. Comparison of Performance of Chosen Algorithms

Authors

Dudek, Andrzej

Content

Full texts:

Download

Title variants

EN

Klasyfikacja dużych zbiorów porównanie wydajności wybranych algorytmów

Languages of publication

Abstracts

EN

Researchers analyzing large (> 100,000 objects) data sets with the methods of cluster analysis often face the problem of computational complexity of algorithms, that sometimes makes it impossible to analyze in an acceptable time. Common solution of this problem is to use less computationally complex algorithms (like k-means), which in turn can in many cases give much worse results than for example algorithms using eigenvalues decomposition . The results of analysis of the actual sets of this type are therefore usually a compromise between quality and computational capabilities of computers. This article is an attempt to present the current state of knowledge on the classification of large datasets, and identify ways to develop and open problems.

PL

Badacze analizujący przy pomocy metod analizy skupień duże (> 100.000 obiektów) zbiory danych, stają często przed problemem złożoności obliczeniowej algorytmów, uniemożliwiającej niekiedy przeprowadzenie analizy w akceptowalnym czasie. Jednym z rozwiązań tego problemu jest stosowanie mniej złożonych obliczeniowo algorytmów (hierarchiczne aglomeracyjne, k-średnich), które z kolei mogą w wielu sytuacjach dawać zdecydowanie gorsze rezultaty niż np. algorytmy wykorzystujące dekompozycję względem wartości własnych. Rezultaty rzeczywistych analiz tego typu zbiorów są więc zazwyczaj kompromisem pomiędzy jakością a możliwościami obliczeniowymi komputerów. Artykuł jest próbą przedstawienia aktualnego stanu wiedzy na temat klasyfikacji dużych zbiorów danych oraz wskazania dróg rozwoju i problemów otwartych.

Keywords

EN

clustering classification large data sets

Publisher

Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

Year

2013

Volume

285

Physical description

Dates

published

2013

Contributors

author

Dudek, Andrzej

other

Wrocław, University of Economics, Chair of Econometrics and Informatics

References

Document Type

Publication order reference

Identifiers

URI

http://hdl.handle.net/11089/10038

YADDA identifier

bwmeta1.element.hdl_11089_10038

Article details

Journal

Acta Universitatis Lodziensis. Folia Oeconomica

Article title

Classification of Large Data Sets. Comparison of Performance of Chosen Algorithms

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Physical description

Dates

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier