Suppose that we are interested in classifying $n$ points in a
$z$dimensional space into two groups having response 1 and response
0 as the target variable. In some real data cases in customer classification,
it is difficult to discriminate the favorable customers showing
response 1 from others because many response 1 points and 0 points
are closely located. In such a case, to find the denser regions
to the favorable customers is considered to be an alternative.
Such regions are called the bumps, and finding them is called the
bump hunting.
By prespecifying a pureness rate $p$ in advance a maximum capture rate $c$
could be obtained. Then a tradeoff curve between $p$ and $c$ can be constructed.
Thus, to find the bump regions is equivalent to construct the tradeoff curve.
When we adopt simpler boundary shapes for the bumps such as the union of $z$dimensional
boxes located parallel to some explanation variable axes, it would be convenient
to adopt the binary decision tree. Since the conventional binary decision tree
will not provide the maximum capture rates, we use the genetic algorithm (GA),
specified to the tree structure, the treeGA. Using the property that the treeGA
has a tendency to provide many local maxima of the capture rates, we can estimate
the upper bound curve for the tradeoff curve by using the extremevalue statistics.
Since the bump regions obtained by using the treeGA are conservative comparing
to the optimal regions, we should investigate how accurate the tradeoff curve
is using the treeGA. We have assessed the accuracy for the tradeoff curve
in typical fundamental cases that may be observed in real customer data cases.
We have found that the proposed treeGA can construct the effective tradeoff
curve which is close to the optimal one.
