The
bump hunting is to find the regions where points we are interested
in are located more densely than elsewhere and are hardly separable
from other points.
By specifying a pureness rate p for the points, a maximum capture rate c of
the points could be obtained. Then, a trade-off curve between p and c can
be constructed. Thus, to find the bump regions is equivalent to construct the
trade-off curve.
We adopt simpler boundary shapes for the bumps such as the box-shaped regions
located parallel to variable axes for convenience. We use the genetic algorithm,
specified to the tree structure, called the tree-GA, to obtain the maximum capture
rates, because the conventional binary decision tree will not provide the maximum
capture rates. Using the tree-GA tendency providing many local maxima for the
capture rates, we can estimate the return period for the trade-off curve by using
the extreme-value statistics.
We have assessed the accuracy for the trade-off curve in typical fundamental
cases that may be observed in real customer data cases, and found that the proposed
tree-GA can construct the effective trade-off curve which is close to the optimal
one.
|
|
|
|
|
data mining, decision tree, genetic algorithm,
bump hunting, extreme-value statistics, trade-off curve,
accuracy, return period, evaluation.
|
|
|