Title
The bump hunting method using the genetic algorithm with the extreme-value statistics

Authors
T. Yukizane, S. Ohi, E. Miyano, H. Hirose

Source
IEICE Transactions D: on Information and Systems, Vol.E89-D, No.8, pp.2332-2339 (2006.8)

Abstract
In difficult classification problems of the z-dimensional points into two groups giving 0-1 responses due to the messy data structure, we try to find the denser regions for the favorable customers of response 1, instead of finding the boundaries to separate the two groups. Such regions are called the bumps, and finding the boundaries of the bumps is called the bump hunting. The main objective of this paper is to find the largest region of the bumps under a specified ratio of the number of the points of response 1 to the total. Then, we may obtain a trade-off curve between the number of points of response 1 and the specified ratio. The decision tree method with the Gini's index will provide the simple-shaped boundaries for the bumps if the marginal density for response 1 shows a rather simple or monotonic shape. Since the computing time searching for the optimal trees will cost much because of the NP-hardness of the problem, some random search methods, e.g., the genetic algorithm adapted to the tree, are useful. Due to the existence of many local maxima unlike the ordinary genetic algorithm search results, the extreme-value statistics will be useful to estimate the global optimum number of captured points; this also guarantees the accuracy of the semi-optimal solution with the simple descriptive rules. This combined method of genetic algorithm search and extreme-value statistics use is new. We apply this method to some artificial messy data case which mimics the real customer database, showing a successful result. The reliability of the solution is discussed.

Key Words
data mining, data science, bump hunting, genetic algorithm, extreme-value statistics, trade-off curve, decision tree, bootstrap

Citation

 

Times Cited in Web of Science: 6

Times Cited in Google Scholar: 15

http://scholar.google.com/; http://ietisy.oxfordjournals.org/cgi/content/abstract/E89-D/8/2332; http://www.topix.net/jp/fukuoka

Cited in Books:

WoS: DISCRETE OPTIMIZATION 巻: 13 ページ: 36-48 発行: AUG 2014; INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL 巻: 14 号: 10 ページ: 3409-3424 発行: OCT 2011 ; IEEE TRANSACTIONS ON DIELECTRICS AND ELECTRICAL INSULATION 巻: 17 号: 1 ページ: 271-279 発行: FEB 2010 ; THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS ページ: 597-600 発行: 2010; IEEE TRANSACTIONS ON DIELECTRICS AND ELECTRICAL INSULATION 17 271-279 FEB 2010 ; WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE 713-717 2007; 7TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS 128-132 2007

Cited in other journals: