The Relationship between a Condition Number and Coefficients of Variation
Languages of publication
A linear model with constant term Y = X(beta) + (eta) is considered in this paper. The problem of near linear dependencies between column 1 (referring to constant term) and remaining columns of X matrix is analysed. It is a special case of multicollinearity. For measuring multicollinearity a condition number of X(T)X (after scaling each column so that it has unit length) is applied. It is the square root ratio of the largest eigenvalue of this matrix to the smallest. Values in excess of 20 are suggested by some of researches as indicative of the multicollinearity. In a model with one regressor collinearity may be caused only by near linear dependence between the column 1 and the column of values of explanatory variable. It means that values of the explanatory variable have small variation. For measuring variability of values of this variable coefficient of variation is applied. In this paper a formula defining the relationship between the condition number and the coefficient of variation is derived. Values of a condition number greater then 20 correspond with the coefficient of variation less then 0,1. In multiple regression model relationships between columns of X matrix are more complex. It is demonstrates by using examples, that even in the cases, when all coefficients of variation exceed 0,20 and dependencies between columns of values of explanatory variables are weak, there is a possibility of multicollinearity. It is therefore claimed that, in order to avoid this problem, one should not only examine separately dependencies between column 1 and each column of X. The analysis of whole X matrix is required. The application of the two-staged procedure of the selection of the explanatory variables, which first eliminates quasi-constant variables and then uses centered correlation coefficients does not exclude multicollinearity.
Publication order reference
CEJSH db identifier