Academic
Publications
Selection bias in gene extraction on the basis of microarray gene-expression data

Selection bias in gene extraction on the basis of microarray gene-expression data,10.1073/pnas.102102699,Proceedings of The National Academy of Scienc

Selection bias in gene extraction on the basis of microarray gene-expression data   (Citations: 470)
BibTex | RIS | RefWorks Download
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called .632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
Journal: Proceedings of The National Academy of Sciences - PNAS , vol. 99, no. 10, pp. 6562-6566, 2002
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...Guyon et al. [10] claimed to obtain better results with the recursive feature elimination method, but, as pointed out by [32], their work contained a methodological flaw.WeusetheSVMrecursivefeatureeliminationalgorithm with this bias removed and present these results as well for comparison(referredtoas“SVM+rfe”inTable1).Finally,we also compare our results with the state-of-the-art Adaboost algorithm...

    Mohak Shahet al. Feature Selection with Conjunctions of Decision Stumps and Learning fr...

    • ...An extreme example of this is that all features are irrelevant to a response, but the selected features still appears fairly predictive to the response, which is, however, made wholly by chance (see Ambroise and McLachlan 2002, for a demonstration)...

    Longhai Li. Bias-Corrected Hierarchical Bayesian Classification With a Selected Su...

    • ...Though Ambroise and McLachlan [3] had shown that LOO-CV may bring in selection bias when used for gene selection, yet considering that we only using LOO-CV to optimize the parameter of SVM and the case of litter samples of tumor, we used it as Pochet et al [24] had done...

    Chun-Hou Zhenget al. Gene selection using independent variable group analysis for tumor cla...

    • ...Again, most of the papers reported crossvalidation (CV) testing accuracy of their methods which suffers from the “selection bias” as the testing sample is not excluded from the gene selection procedure [43]...
    • ...In order to evaluate the true performance of a computer-aided diagnosis (CAD) method, it is mandatory to exclude the testing samples from the classifier building process, i.e., data normalization, gene selection, and model parameter selection [43], [44]...
    • ...Previous study has shown that the CV10 is more appropriate when considering the compromise between bias and variance [43], [65]...

    Santanu Ghoraiet al. Cancer Classification from Gene Expression Data by NPPC Ensemble

    • ...B.632? is cross-validation like evaluation schema, and it was considered to have lower variance than others crossvalidation method in the small sample case [21]...

    Ruichu Caiet al. A new hybrid method for gene selection

Sort by: