The paper presents an original filter approach for effective feature selection in microarray data characterized by a large number of input variables and a few samples. The approach is based on the use of a new information-theoretic selection, the double input symmetrical relevance (DISR), which relies on a measure of variable complementarity. This measure evaluates the additional information that a set of variables provides about the output with respect to the sum of each single variable contribution. We show that a variable selection approach based on DISR can be formulated as a quadratic optimization problem: the dispersion sum problem (DSP). To solve this problem, we use a strategy based on backward elimination and sequential replacement (BESR). The combination of BESR and the DISR criterion is compared in theoretical and experimental terms to recently proposed information-theoretic criteria. Experimental results on a synthetic dataset as well as on a set of eleven microarray classification tasks show that the proposed technique is competitive with existing filter selection methods.
Schretter, C, Meyer, PE & Bontempi, G 2008, 'Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity', IEEE Journal of Selected Topics in Signal Processing, vol. 2, pp. 261-274. <http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4550559>
Schretter, C., Meyer, P. E., & Bontempi, G. (2008). Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity. IEEE Journal of Selected Topics in Signal Processing, 2, 261-274. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4550559
@article{0b64edffba404528a96bd381930703a6,
title = "Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity",
abstract = "The paper presents an original filter approach for effective feature selection in microarray data characterized by a large number of input variables and a few samples. The approach is based on the use of a new information-theoretic selection, the double input symmetrical relevance (DISR), which relies on a measure of variable complementarity. This measure evaluates the additional information that a set of variables provides about the output with respect to the sum of each single variable contribution. We show that a variable selection approach based on DISR can be formulated as a quadratic optimization problem: the dispersion sum problem (DSP). To solve this problem, we use a strategy based on backward elimination and sequential replacement (BESR). The combination of BESR and the DISR criterion is compared in theoretical and experimental terms to recently proposed information-theoretic criteria. Experimental results on a synthetic dataset as well as on a set of eleven microarray classification tasks show that the proposed technique is competitive with existing filter selection methods.",
keywords = "feature extraction, filtering theory, quadratic programming, signal classification",
author = "Colas Schretter and Meyer, {Patrick E.} and Gianluca Bontempi",
year = "2008",
language = "English",
volume = "2",
pages = "261--274",
journal = "IEEE Journal of Selected Topics in Signal Processing",
issn = "1932-4553",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
}