To
predict the continuous value of target variable y using the values
of explanation variables X, we often use multiple linear regression
methods (MLR), y = X, and many applications have been successfully
reported. To assess the prediction accuracy of y to new X, we usually
perform the two procedures: constructing the prediction formula
(structure) using the training data, and assessing the accuracy
by applying the prediction formula to the test data. The cross-validation
method or the bootstrap method is often used to evaluate the accuracy
statistically.
To improve the accuracy for the test data, a variety of regularization
methods in which penalty functions are imposed are proposed; e.g.,
the ridge, lasso, elastic net, and adaptive lasso are among them.
The regularization methods, in a sense, select the most appropriate
feature variables. However, in some data cases, the MLR may not work
because of strong local dependency of the target variable to the
explanation variables; the MLR uses all the samples. For example,
in predicting the prices of used cars in auctions, even the use of
the regularization methods such as the ridge, lasso, and their relatives,
could not improve the prediction accuracy much.
In such applications, the use of the k nearest-neighbor method (k-NN)
in regression (NNR) can be an alternative. Collecting k nearest-neighbor
cars, the more similar the cars, the more accurate in prediction,
we may expect. However, the sole use of the simple k-NN regression
method could not improve the prediction accuracy much. It uses some
selecting points, but uses all the feature variables.
We, thus, propose a combination method of the MLR and the NNR, called
NNRMLR. The NNRMLR first performs the multiple linear regressions
with some regularization method, and obtain the effective feature
variables as the weighting functions. Then, by using the values of
estimated parameters, we redefine the weighted distances between
the two points which will be used in the NNR. That is, NNRMLR uses
the selected points for evaluation with selected feature variables.
After the most appropriate value of k is investigated, we have found
that the prediction accuracy by the proposed method is improved than
that by the MLR or the NNR in the used car auction example case.
In the previous research, we used the mean value for NNR evaluation.
Here, we use the regression value to for NNR evaluation. |
|
|
|
|
|
|