Preprint Online March 04, 2011

Correlation-based Gene Selection and Classification Using Taguchi-BPSO

Journal:Methods of Information in Medicine
ISSN:0026-1270
DOI:http://dx.doi.org/10.3414/ME09-01-0010
Issue:2010 (Vol. 49): Issue 3 2010
Pages:254-268

Correlation-based Gene Selection and Classification Using Taguchi-BPSO

Original Article

L.-Y. Chuang (1), C.-S. Yang (2, 3), K.-C. Wu (4), C.-H. Yang (5, 6)

(1) Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan; (2) Department of Plastic Surgery, Chiayi Christian Hospital, Chiayi, Taiwan; (3) Institute of Biomedical Engineering, National Cheng Kung University, Tainan, Taiwan; (4) Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan; (5) Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan; (6) Department of Network Systems, Toko University, Chiayi, Taiwan

Summary

Background: Microarray data with reference to gene expression profiles have provided some valuable results related to a variety of problems, and contributed to advances in clinical medicine. Microarray data characteristically have a high dimension and small sample size, which makes it difficult for a general classification method to obtain correct data for classification. However, not every gene is potentially relevant for distinguishing the sample class. Thus, in order to analyze gene expression profiles correctly, feature (gene) selection is crucial for the classification process, and an effective gene extraction method is necessary for eliminating irrelevant genes and decreasing the classification error rate. Objective: The purpose of gene expression analysis is to discriminate between classes of samples, and to predict the relative importance of each gene for sample classification. Method: In this paper, correlation-based feature selection (CFS) and Taguchi-binary particle swarm optimization (TBPSO) were combined into a hybrid method, and the K-nearest neighbor (K-NN) with leave-one-out cross-validation (LOOCV) method served as a classifier for ten gene expression profiles. Results: Experimental results show that this hybrid method effectively simplifies feature selection by reducing the number of features needed. The classification error rate obtained by the proposed method had the lowest classification error rate for all of the ten gene expression data set problems tested. For six of the gene expression profile data sets a classification error rate of zero could be reached. Conclusion: The introduced method outperformed five other methods from the literature in terms of classification error rate. It could thus constitute a valuable tool for gene expression analysis in future studies.

Keywords

Microarray data, correlation-based feature selection, Taguchi-binary particle swarm optimization, K-nearest neighbor

DOI

http://dx.doi.org/10.3414/ME09-01-0010

Articles

You've 194 Article(s) in your Basket.

Methods News

5 Fachzeitschriften feiern 2012 "runde" Jubiläen

2012 feiern gleich fünf der 21 Fachzeitschriften "runde" Jubiläen, die im Schattauer...

Methods issue 1/2012

Monitoring and analyzing physiological parameters under different conditions of health and disease...

Methods issue 6/2011

This issue of Methods of Information in Medicine celebrates the journal‘s first 50 years. As the...

Call for Papers

We are inviting submissions for a Focus Theme in Methods of Information in Medicine on "Web...

Invitation to Apply for the Student Editorial Board 2012/2013

The Journal Methods of Information in Medicine is now accepting applications from trainees who...

Methods issue 5/2011

In 2008 it was decided that Methods of Information in Medicine will continue the tradition of...