The
bump hunting, proposed by Friedman and Fisher, has become important
in many fields such as market- ing and medical fields, and etc.
Among them, to answer the unresolved question of molecular heterogeneity
and of tumoral phenotype in cancer, the local sparse bump hunting
algorithm, such as CART (Classification and Regression Trees) and
PRIM (Patient Rule Induction Method), is useful. In the bump hunting,
we use the trade-off curve as a criterion such that the algorithm
works effectively, instead of the misclassification rate in classi-
fication problems. The trade-off curve is constructed by finding
the relation between the pureness rate and the capture rate. So
far, we assessed the accuracy for the trade-off curve in typical
fundamental cases that may be observed in real cases, and found
that the proposed tree-GA can construct the effective trade-off
curve. In addition, we investigated the prediction accuracy of
the tree-GA by comparing the trade-off curve obtained by using
the tree-GA with that obtained by using the PRIM, and found the
superiority of the tree-GA over the PRIM when the sample size is
large. In this paper, to focus on the sparse and small sample size
cases observed in medical cases, we have investigated the typical
fundamental cases using Monte Carlo simulations, and we found that
the non-ignorable biases exist in the tree-GA. We have proposed
a method here to remove such biases. |
|
|
|
|
|
|