A Comparative Study in the Bump Hunting between the Tree-GA and the PRIM


Hideo Hirose, Genki Koga


13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD2012), August 08-10, 2012, Kyoto, Japan

Studies in Computational Intelligence, Volume 443, 13-25, DOI: 10.1007/978-3-642-32172-6_2,@2013 Springer

The bump hunting, proposed by Friedman and Fisher, has become important in many fields. Suppose that we are interested in finding regions where target points are denser than other regions. Such dense regions of target points are called the bumps, and finding them is called bump hunting. By pre-specifying a pureness rate in advance, a maximum capture rate could be obtained. Then, a trade-off curve between the two can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve. When we adopt simpler boundary shapes for the bumps such as the union of boxes located parallel to some explanation variable axes, it would be convenient to adopt the binary decision tree. Since the conventional binary decision tree, e.g., CART (Classification and Regression Trees), will not provide the maximum capture rates, we use the genetic algorithm (GA), specified to the tree structure, the tree-GA. So far, we assessed the accuracy for the trade-off curve in typical fundamental cases that may be observed in real customer data cases, and found that the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one. In this paper, we further investigate the prediction accuracy of the tree-GA by comparing the trade-off curve obtained by using the tree-GA with that obtained by using the PRIM (Patient Rule Induction Method) proposed by Friedman and Fisher. We have found that the tree-GA reveals the superiority over the PRIM in some cases.

Key Words
data mining, bump hunting, trade-off curve, genetic algorithm, CART, PRIM



Times Cited in Web of Science:

Cited in Books: