The
bump hunting, proposed by Friedman and Fisher, has become important
in many fields. Suppose that we are interested in finding regions
where target points are denser than other regions. Such dense regions
of target points are called the bumps, and finding them is called
bump hunting. By pre-specifying a pureness rate in advance, a maximum
capture rate could be obtained. Then, a trade-off curve between
the two can be constructed. Thus, to find the bump regions is equivalent
to construct the trade-off curve. When we adopt simpler boundary
shapes for the bumps such as the union of boxes located parallel
to some explanation variable axes, it would be convenient to adopt
the binary decision tree. Since the conventional binary decision
tree will not provide the maximum capture rates, we use the genetic
algorithm (GA), specified to the tree structure, the tree-GA. Assessing
the accuracy for the trade-off curve in typical fundamental cases,
the proposed tree-GA can construct the effective trade-off curve
which is close to the optimal one comparing with that obtained
by using the PRIM (Patient Rule Induction Method) proposed by Friedman
and Fisher.
To apply this method in the medical field, we have to overcome the difficulty
of N-P problem, i.e., the number of samples, N, is less than the number of feature
variables, P. For higher dimension, the trade-off curve may be biased. In this
presentation, we demonstrate this kind of bias in typical cases, and we propose
a method to remove the bias. We have found that the proposed method worked as
long as the dimension for feature variables is not so large. |
|
|
|
|
|
|