NNRMLR3: Further Improved Combination Method of Nearest Neighbor Regression and Multiple Linear Regression


Hideo Hirose


2nd International Symposium on Applied Engineering and Sciences (SAES2014), Big Data Session 2, December 20-21, 2014, Fukuoka, Japan

To predict the continuous value of target variable y using the values of explanation variables X, we often use multiple linear regression methods (MLR), y = X, and many applications have been successfully reported. To assess the prediction accuracy of y to new X, we usually perform the two procedures: constructing the prediction formula (structure) using the training data, and assessing the accuracy by applying the prediction formula to the test data. The cross-validation method or the bootstrap method is often used to evaluate the accuracy statistically.
To improve the accuracy for the test data, a variety of regularization methods in which penalty functions are imposed are proposed; e.g., the ridge, lasso, elastic net, and adaptive lasso are among them. The regularization methods, in a sense, select the most appropriate feature variables. However, in some data cases, the MLR may not work because of strong local dependency of the target variable to the explanation variables; the MLR uses all the samples. For example, in predicting the prices of used cars in auctions, even the use of the regularization methods such as the ridge, lasso, and their relatives, could not improve the prediction accuracy much.
In such applications, the use of the k nearest-neighbor method (k-NN) in regression (NNR) can be an alternative. Collecting k nearest-neighbor cars, the more similar the cars, the more accurate in prediction, we may expect. However, the sole use of the simple k-NN regression method could not improve the prediction accuracy much. It uses some selecting points, but uses all the feature variables.
We, thus, propose a combination method of the MLR and the NNR, called NNRMLR. The NNRMLR first performs the multiple linear regressions with some regularization method, and obtain the effective feature variables as the weighting functions. Then, by using the values of estimated parameters, we redefine the weighted distances between the two points which will be used in the NNR. That is, NNRMLR uses the selected points for evaluation with selected feature variables. After the most appropriate value of k is investigated, we have found that the prediction accuracy by the proposed method is improved than that by the MLR or the NNR in the used car auction example case.
In the previous research, we used the mean value for NNR evaluation. Here, we use the regression value to for NNR evaluation.

Key Words



Times Cited in Web of Science:

Cited in Books: