Estimation of Optimal Sample Size in Decision Forest of SVM with Embedded Cross-validation Method


Faisal Zaman and Hideo Hirose

3rd Asian Conference on Intelligent Information and Database Systems (ACIIDS 2011), Daegu, Korea on April 20-22, 2011.

In this paper the performance of the $m$-out-of-$n$ decision forest of SVM without replacement with different subsampling ratio ($frac{m}{n}$) is analyzed in terms of an emph{embedded cross-validation} technique. The subsampling ratio plays a pivotal role in improving the performance of the decision forest of SVM. Because the SVM in this ensemble enlarge the feature space of the underlying base decision tree classifiers and guarantees a improved performance of the ensemble overall. To ensure the better training of the SVM generally the out-of-bag sample is kept larger but there is no general rule to estimate the optimal sample size for the decision forest. In this paper we propose to use the embedded cross-validation method to select the a near optimum value of the sampling ratio. In our criterion the decision forest of SVM trained on independent samples whose size is such that the cross-validation error of that ensemble is as low as possible, will produce an improved generalization performance for the ensemble.

Key Words
Optimal sampling ratio, Decision forest of SVM, Embed- ded cross-validation error .



Times Cited in Web of Science:

Cited in Books: