Publications
2021 |
Lopez-Fernandez, A.; Rodríguez-Baena, D.; Gómez-Vela, F.; Divina, F.; García-Torres, M. A multi-GPU biclustering algorithm for binary datasets Journal Article In: Journal of Parallel and Distributed Computing, vol. 147, pp. 209-219, 2021, ISSN: 0743-7315. Abstract | Links | BibTeX | Tags: Biclustering, Big Data, CUDA, GPU @article{Lopez-Fernandez2020, Graphics Processing Units technology (GPU) and CUDA architecture are one of the most used options to adapt machine learning techniques to the huge amounts of complex data that are currently generated. Biclustering techniques are useful for discovering local patterns in datasets. Those of them that have been implemented to use GPU resources in parallel have improved their computational performance. However, this fact does not guarantee that they can successfully process large datasets. There are some important issues that must be taken into account, like the data transfers between CPU and GPU memory or the balanced distribution of workload between the GPU resources. In this paper, a GPU version of one of the fastest biclustering solutions, BiBit, is presented. This implementation, named gBiBit, has been designed to take full advantage of the computational resources offered by GPU devices. Either using a single GPU device or in its multi-GPU mode, gBiBit is able to process large binary datasets. The experimental results have shown that gBiBit improves the computational performance of BiBit, a CPU parallel version and an early GPU version, called ParBiBit and CUBiBit, respectively. gBiBit source code is available at https://github.com/aureliolfdez/gbibit. |
2020 |
Lopez-Fernandez, A.; Rodríguez-Baena, D.; Gómez-Vela, F. gMSR: A Multi-GPU Algorithm to Accelerate a Massive Validation of Biclusters Journal Article In: Electronics, vol. 9, no. 11, pp. 1782, 2020, ISSN: 2079-9292. Abstract | Links | BibTeX | Tags: Biclustering, Biclustering validation, CUDA, GPU, MSR @article{Lopez-Fernandez2020b, Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context. |
2013 |
Rodríguez-Baena, D. Extracting and validating biclusters from binary datasets Journal Article In: AI Communications, vol. 26, no. 4, pp. 417-418, 2013. Abstract | Links | BibTeX | Tags: Biclustering, Binary dataset, Biological validation @article{Rodríguez-Baena2013, This work proposes a novel algorithm to extract biclusters from binary datasets: the Bit-Pattern Biclustering Algorithm (BiBit). The selective search performed by BiBit, based on a very fast bits words processing technique, provides very satisfactory results in quality and computational cost. Besides, a new software tool, named CarGene (Characterization of Genes), that helps scientists to validate sets of genes using biological knowledge is introduced too. |
2011 |
Gómez-Vela, F.; Martínez-Álvarez, F.; Barranco, C. D.; Díaz-Díaz, N.; Rodríguez-Baena, D.; Aguilar-Ruiz, J. Pattern Recognition in Biological Time Series Journal Article In: Advances in Artificial Intelligence, pp. 164-172, 2011, ISBN: 978-3-642-25274-7. Abstract | Links | BibTeX | Tags: Biclustering, Clustering, Gene Network @article{Gómez-Vela2011b, Knowledge extraction from gene expression data has been one of the main challenges in the bioinformatics field during the last few years. In this context, a particular kind of data, data retrieved in a temporal basis (also known as time series), provide information about the way a gene can be expressed during time. This work presents an exhaustive analysis of last proposals in this area, particularly focusing on those proposals using non--supervised machine learning techniques (i.e. clustering, biclustering and regulatory networks) to find relevant patterns in gene expression. |
Rodríguez-Baena, D.; Pérez-Pulido, A. J.; Aguilar-Ruiz, J. A biclustering algorithm for extracting bit-patterns from binary datasets Journal Article In: Bioinformatics, vol. 27, no. 19, pp. 2738-2745, 2011, ISSN: 1367-4803. Abstract | Links | BibTeX | Tags: Biclustering, Binary dataset @article{Rodríguez-Baena2011, Motivation: Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations. Results: A novel approach to extracting biclusters from binary datasets, BiBit, is introduced here. The results obtained from different experiments with synthetic data reveal the excellent performance and the robustness of BiBit to density and size of input data. Also, BiBit is applied to a central nervous system embryonic tumor gene expression dataset to test the quality of the results. A novel gene expression preprocessing methodology, based on expression level layers, and the selective search performed by BiBit, based on a very fast bit-pattern processing technique, provide very satisfactory results in quality and computational cost. The power of biclustering in finding genes involved simultaneously in different cancer processes is also shown. Finally, a comparison with Bimax, one of the most cited binary biclustering algorithms, shows that BiBit is faster while providing essentially the same results. |