Bias Correction of Tradeoff Curve in the Bump Hunting Using the Tree-GA


Yu Aizawa, Hideo Hirose


The 2nd BMIRC International Symposium on Frontiers in Computational Systems Biology and Bioengineering, January 29 - 30, 2013, Fukuoka, Japan.


The bump hunting, proposed by Friedman and Fisher, has become important in many fields. Suppose that we are interested in finding regions where target points are denser than other regions. Such dense regions of target points are called the bumps, and finding them is called bump hunting. By pre-specifying a pureness rate in advance, a maximum capture rate could be obtained. Then, a trade-off curve between the two can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve. When we adopt simpler boundary shapes for the bumps such as the union of boxes located parallel to some explanation variable axes, it would be convenient to adopt the binary decision tree. Since the conventional binary decision tree will not provide the maximum capture rates, we use the genetic algorithm (GA), specified to the tree structure, the tree-GA. Assessing the accuracy for the trade-off curve in typical fundamental cases, the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one comparing with that obtained by using the PRIM (Patient Rule Induction Method) proposed by Friedman and Fisher.
To apply this method in the medical field, we have to overcome the difficulty of N-P problem, i.e., the number of samples, N, is less than the number of feature variables, P. For higher dimension, the trade-off curve may be biased. In this presentation, we demonstrate this kind of bias in typical cases, and we propose a method to remove the bias. We have found that the proposed method worked as long as the dimension for feature variables is not so large.

Key Words



Times Cited in Web of Science:

Cited in Books: