Publications
2018 |
Cravero, F.; Schustik, S.; Martínez, M. J.; Barranco, C. D.; Díaz, M. F.; Ponzoni, I. Practical Applications of Computational Biology and Bioinformatics, 12th International Conference, 2018, ISBN: 978-3-319-98702-6. Abstract | Links | BibTeX | Tags: Artificial intelligence, Feature selection @conference{Cravero2018, QSPR (Quantitative Structure-Property Relationship) models proposed in Polymer Informatics typically use reduced computational representations of polymers for avoiding the complex issues related with the polydispersion of these industrial materials. In this work, the aim is to assess the effect of this oversimplification in the modelling decisions and to analyze strategies for addressing alternative characterizations of the materials that capture, at least partially, the polydispersion phenomenon. In particular, a cheminformatic study for estimating a tensile property of polymers is presented here. Four different computational representations are analyzed in combination with several machine learning approaches for selecting the most relevant molecular descriptors associated with the target property and for learning the corresponding QSPR models. The obtained results give insight about the limitations of using oversimplified representations of polymers and contribute with alternative strategies for achieving more realistic models. |
Martínez-García, P. M.; García-Torres, M.; Divina, F.; Gómez-Vela, F.; Cortés-Ledesma, F. Applications of Evolutionary Computation, 2018, ISBN: 978-3-319-77538-8. Abstract | Links | BibTeX | Tags: Binding sites, Classification, Feature selection @conference{Martínez-García2018, Topoisomerases are proteins that regulate the topology of DNA by introducing transient breaks to relax supercoiling. In this paper we focus our attention on Topoisomerases 2 (TOP2), which generate double-strand DNA breaks that, if inefficiently repaired, can seriously compromise genomic stability. It is then important to gain insights on the molecular processes involved in TOP2-DNA binding. In order to do this, we collected genomic and epigenomic information from publicly available high-throughput sequencing projects and systematically quantified them within experimentally measured TOP2 binding sites. We then applied feature selection techniques in order to both increase the performance of classification and to gain insight on the particular properties that can be of biological relevance. Results obtained allowed us to identify a core set of predictive chromatin features that faithfully explain TOP2 binding. |
2016 |
García-Torres, M.; Gómez-Vela, F.; Melián-Batista, B.; Moreno-Vega, J. M. High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach Journal Article In: Information Sciences, vol. 326, pp. 102-118, 2016, ISSN: 0020-0255. Abstract | Links | BibTeX | Tags: Feature selection, Metaheuristic @article{García-Torres2016, In recent years, advances in technology have led to increasingly high-dimensional datasets. This increase of dimensionality along with the presence of irrelevant and redundant features make the feature selection process challenging with respect to efficiency and effectiveness. In this context, approximate algorithms are typically applied since they provide good solutions in a reasonable time. On the other hand, feature grouping has arisen as a powerful approach to reduce dimensionality in high-dimensional data. Recently, some authors have focused their attention on developing methods that combine feature grouping and feature selection to improve the model. In this paper, we propose a feature selection strategy that utilizes feature grouping to increase the effectiveness of the search. As feature selection strategy, we propose a Variable Neighborhood Search (VNS) metaheuristic. Then, we propose to group the input space into subsets of features by using the concept of Markov blankets. To the best of our knowledge, this is the first time in which the Markov blanket is used for grouping features. We test the performance of VNS by conducting experiments on several high-dimensional datasets from two different domains: microarray and text mining. We compare VNS with popular and competitive techniques. Results show that VNS is a competitive strategy capable of finding a small size of features with similar predictive power than that obtained with other algorithms used in this study. |
2006 |
Aguilar-Ruiz, J.; Nepomuceno, J. A.; Díaz-Díaz, N.; Nepomuceno-Chamorro, I. A. A Measure for Data Set Editing by Ordered Projections Conference Advances in Applied Artificial Intelligence, 2006, ISBN: 978-3-540-35454-3. Abstract | Links | BibTeX | Tags: Feature selection @conference{Aguilar-Ruiz2006, In this paper we study a measure, named weakness of an example, which allows us to establish the importance of an example to find representative patterns for the data set editing problem. Our approach consists in reducing the database size without losing information, using algorithm patterns by ordered projections. The idea is to relax the reduction factor with a new parameter, ?, removing all examples of the database whose weakness verify a condition over this ?. We study how to establish this new parameter. Our experiments have been carried out using all databases from UCI-Repository and they show that is possible a size reduction in complex databases without notoriously increase of the error rate. |
2005 |
Ruiz, R.; Aguilar-Ruiz, J.; Riquelme, J. C.; Díaz-Díaz, N. Analysis of Feature Rankings for Classification Conference Advances in Intelligent Data Analysis VI, 2005, ISBN: 978-3-540-31926-9. Abstract | Links | BibTeX | Tags: Feature selection @conference{Ruiz2005, Different ways of contrast generated rankings by feature selection algorithms are presented in this paper, showing several possible interpretations, depending on the given approach to each study. We begin from the premise of no existence of only one ideal subset for all cases. The purpose of these kinds of algorithms is to reduce the data set to each first attributes without losing prediction against the original data set. In this paper we propose a method, feature–ranking performance, to compare different feature–ranking methods, based on the Area Under Feature Ranking Classification Performance Curve (AURC). Conclusions and trends taken from this paper propose support for the performance of learning tasks, where some ranking algorithms studied here operate. |