13th ACIS International Conference on Software Engineering,
Artificial Intelligence, Networking and Parallel/Distributed
Computing (SNPD2012), August 08-10, 2012, Kyoto, Japan
Studies in Computational Intelligence, Volume 443,
13-25, DOI: 10.1007/978-3-642-32172-6_2,@2013 Springer
|
|
|
The
bump hunting, proposed by Friedman and Fisher, has become important
in many fields. Suppose that we are interested in finding regions
where target points are denser than other regions. Such dense regions
of target points are called the bumps, and finding them is called
bump hunting. By pre-specifying a pureness rate in advance, a maximum
capture rate could be obtained. Then, a trade-off curve between
the two can be constructed. Thus, to find the bump regions is equivalent
to construct the trade-off curve. When we adopt simpler boundary
shapes for the bumps such as the union of boxes located parallel
to some explanation variable axes, it would be convenient to adopt
the binary decision tree. Since the conventional binary decision
tree, e.g., CART (Classification and Regression Trees), will not
provide the maximum capture rates, we use the genetic algorithm
(GA), specified to the tree structure, the tree-GA. So far, we
assessed the accuracy for the trade-off curve in typical fundamental
cases that may be observed in real customer data cases, and found
that the proposed tree-GA can construct the effective trade-off
curve which is close to the optimal one. In this paper, we further
investigate the prediction accuracy of the tree-GA by comparing
the trade-off curve obtained by using the tree-GA with that obtained
by using the PRIM (Patient Rule Induction Method) proposed by Friedman
and Fisher. We have found that the tree-GA reveals the superiority
over the PRIM in some cases. |
|
|
|
|
data
mining, bump hunting, trade-off curve, genetic algorithm,
CART, PRIM
|
|
|