Results found: 2

Search results

Search:
in the keywords: k-means algorithm

Sort By:

Limit search:

The number of clusters in hybrid predictive models: does it really matter?

100%

Łapczyński M., Jefmański B.

Przegląd Statystyczny

2019

vol. 66

issue 3

228-238

For quite a long time, research studies have attempted to combine various analytical tools to build predictive models. It is possible to combine tools of the same type (ensemble models, committees) or tools of different types (hybrid models). Hybrid models are used in such areas as customer relationship management (CRM), web usage mining, medical sciences, petroleum geology and anomaly detection in computer networks. Our hybrid model was created as a sequential combination of a cluster analysis and decision trees. In the first step of the procedure, objects were grouped into clusters using the k-means algorithm. The second step involved building a decision tree model with a new independent variable that indicated which cluster the objects belonged to. The analysis was based on 14 data sets collected from publicly accessible repositories. The performance of the models was assessed with the use of measures derived from the confusion matrix, including the accuracy, precision, recall, F-measure, and the lift in the first and second decile. We tried to find a relationship between the number of clusters and the quality of hybrid predictive models. According to our knowledge, similar studies have not been conducted yet. Our research demonstrates that in some cases building hybrid models can improve the performance of predictive models. It turned out that the models with the highest performance measures require building a relatively large number of clusters (from 9 to 15).

Methods for imputation of missing values and their influence on the results of segmentation research

100%

Gąsior M., Skowron Ł.

Econometrics. Ekonometria. Advances in Applied Data Analytics

2016

issue 4 (54)

61-71

The lack of answers is a common problem in all types of research, especially in the field of social sciences. Hence a number of solutions were developed, including the analysis of complete cases or imputations that supplement the missing value with a value calculated according to different algorithms. This paper evaluates the influence of the adopted method for the supplementation of missing answers regarding the result of segmentation conducted with the use of cluster analysis. In order to achieve this we used a set of data from an actual consumer research in which the cases with missing values were deleted or supplemented with the use of various methods. Cluster analyses were then performed on those sets of data, both with the assumption of ordinal and ratio level of measurement, and then the grouping quality, as expressed by different indicators, was evaluated. This research proved the advantage of imputation over the analysis of complete cases, it also proved the validity of using more complex approaches than the simple supplementation with an average or median value.

Refine search results

1 Econometrics. Ekonometria. Advances in Applied Data Analytics

1 Przegląd Statystyczny

1 Gąsior M.

1 Jefmański B.

1 Skowron Ł.

1 Łapczyński M.

1 2020

1 2016

Search results

The number of clusters in hybrid predictive models: does it really matter?

Methods for imputation of missing values and their influence on the results of segmentation research