多次元空間内に特徴量を持ち2値(0/1)反応をとるN個の点の中から反応1を示す点が他に比べて密な領域(バンプ,ホットスポット)を探索する問題を考える.これまでに,探索結果を予測に使いやすくするためには決定木を用いたバンプ探索法が有効であり,また最適値を求めるためには確率的探索法(GA)に加え,極値統計を用いる方法を新しく提案した.しかし,得られた結果がどのような精度を持っているかはまだ分かっていなかった.ここでは,学習データとテストデータを使うことで,この問題がいかに深刻であるかを指摘し,次に最適に探索された結果の精度について述べる.テストサンプル法とブートストラップのメリットを併せ持つ方法も提案する.
Suppose that we are interested in searching for
denser regions showing response 1 with many feature variables in
a z-dimensional space, where each point is assigned response 1
or response 0 as its target value; such a region is called the
bump or the hot-spot. In a series of the previous study, we have
shown that the bump hunting using the decision tree is useful in
the ease-of-use and the prediction capability view points, and
have developed a new bump hunting method using probabilistic (GA)
and statistical (extreme-value statistics) methods. However, the
accuracy of the estimated maximum capture rate was assessed by
using the simple bootstrap method without correction formula. We
have not thought seriously of the bias and the variance to the
predicted estimate; we are, however, now aware of that we should
treat the value of the predicted estimate very carefully. Thus,
we have proposed a new method to assess the prediction error in
the bump hunting problem, where the test sample method and the
bootstrap method are nicely combined.
|