2016 | 2 (52) | 35-42
Article title

Regression analysis for interval-valued symbolic data versus noisy variables and outliers

Title variants
Regresja liniowa danych symbolicznych a zmienne zakłócające i obserwacje odstające
Languages of publication
Regression analysis is perhaps the best known and most widely used method used for the analysis of dependence; that is, for examining the relationship between a set of independent variables (X’s) and a single dependent variable (Y). In general regression, the model is a linear combination of independent variables that corresponds as closely as possible to the dependent variable [Lattin, Carroll, Green 2003, p. 38]. The aim of the article is to present two suitable adaptations for a regression analysis of symbolic interval-valued data (centre method and centre and range method) and to compare their usefulness when dealing with noisy variables and/or outliers. The empirical part of the paper presents the results of simulation studies based on artificial and real data, without noisy variables and/or outliers and with noisy variable and outliers. The results are compared according to the values of two coefficients of determination 2 RL and 2 . RU The results show that usually the centre and range method obtains better results even when the data set contains noisy variables and outliers, but in some cases the centre method obtains better results than the centre and range method.
Physical description
  • Billard L., Diday E., 2006, Symbolic Data Analysis. Conceptual Statistics and Data Mining, John Wiley & Sons, Chichester.
  • Bock H.-H., Diday E. (eds.), 2000, Analysis of Symbolic Data. Explanatory Methods for Extracting Statistical Information from Complex Data, Springer Verlag, Berlin-Heidelberg.
  • Diday E., Noirhomme-Fraiture M., 2008, Symbolic Data Analysis. Conceptual Statistics and Data Mining, Wiley&Sons, Chichester.
  • Dudek A., 2013, Metody analizy danych symbolicznych w badaniach ekonomicznych, Wyd. UE we Wrocławiu, Wrocław.
  • Hair J.F., Black W.C., Babim B.J., Anderson R.E., Tatham R.L., 2006, Multivariate Data Analysis, Prentice Hall, New Jersey.
  • Lattin J., Carroll J.D., Green P.E., 2003, Analyzing Multivariate Data, Thomson Learning, Toronto.
  • Lima-Neto E.A., de Carvalho F.A.T., 2008, Centre and range method to fitting a linear regression model on symbolic interval data, Computational Statistics and Data Analysis, vol. 52, pp. 1500–1515.
  • Lima-Neto E.A., de Carvalho F.A.T., 2010, Constrained linear regression models for symbolic interval-valued variables, Computational Statistics and Data Analysis, vol. 54, pp. 333–347.
  • Milligan G.W., Cooper M.C., An examination of procedures for determining the number of clusters in a data set, Psychometrika, vol. 50, no. 2, pp. 159–179.
  • Qiu W., Joe H., 2006, Generation of Random Clusters with Specified Degree of Separation. Journal of Classification, vol. 23, pp. 315-334.
  • Walesiak M., Dudek A., 2014, The clusterSim package [URL:]
  • Walesiak M., Gatnar E. (eds.), 2004, Metody statystycznej analizy wielowymiarowej w badaniach marketingowych, Wyd. Akademii Ekonomicznej im. Oskara Langego we Wrocławiu, Wrocław.
  • Welfe A., 2013, Ekonometria, PWN, Warszawa.
Document Type
Publication order reference
YADDA identifier
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.