Publications
2025 |
Lopez-Fernandez, A.; Gómez-Vela, F.; Rodríguez-Baena, D.; Gómez-Vela, F.; González-Domínguez, J. Biclustering in bioinformatics using big data and High Performance Computing applications: challenges and perspectives, a review Journal Article In: The Journal of Supercomputing, vol. 81, pp. 1123, 2025. Abstract | Links | BibTeX | Tags: Big Data, Bioinformatics, Biological databases, Data Analysis and Big Data, Functional clustering, Protein databases @article{Lopez-Fernandez2025b, Biclustering is a powerful machine learning technique that simultaneously groups rows and columns in matrix-based datasets. Applied to gene expression data in bioinformatics, its use has expanded alongside the rapid growth of high-throughput sequencing technologies, leading to massive and complex biological datasets. This review aims to examine how biclustering methods and their validation strategies are evolving to meet the demands of High Performance Computing (HPC) and Big Data environments. We present a structured classification of existing approaches based on the computational paradigms they employ, including MPI/OpenMP, Apache Hadoop/Spark, and GPU/CUDA. By synthesising these developments, we highlight current trends and outline key research challenges. The knowledge gathered in this work may support researchers in adapting and scaling biclustering algorithms to analyse large-scale biomedical data more efficiently. Our contribution is intended to bridge the gap between algorithmic innovation and computational scalability in the context of bioinformatics and data-intensive applications. |
2024 |
Lopez-Fernandez, A.; Gómez-Vela, F.; Saz-Navarro, Dulcenombre M.; Delgado, F. M.; Rodríguez-Baena, D. Optimized Python library for reconstruction of ensemble-based gene co-expression networks using multi-GPU Journal Article In: The Journal of Supercomputing, 2024, ISSN: 1573-0484. Abstract | Links | BibTeX | Tags: Big Data, Bioinformatics, Data Mining, Gene co-expression network, GPU, High-Performance Computing @article{Lopez-Fernandez2024b, Gene co-expression networks are valuable tools for discovering biologically relevant information within gene expression data. However, analysing large datasets presents challenges due to the identification of nonlinear gene–gene associations and the need to process an ever-growing number of gene pairs and their potential network connections. These challenges mean that some experiments are discarded because the techniques do not support these intense workloads. This paper presents pyEnGNet, a Python library that can generate gene co-expression networks in High-performance computing environments. To do this, pyEnGNet harnesses CPU and multi-GPU parallel computing resources, efficiently handling large datasets. These implementations have optimised memory management and processing, delivering timely results. We have used synthetic datasets to prove the runtime and intensive workload improvements. In addition, pyEnGNet was used in a real-life study of patients after allogeneic stem cell transplantation with invasive aspergillosis and was able to detect biological perspectives in the study. |
2021 |
Lopez-Fernandez, A.; Rodríguez-Baena, D.; Gómez-Vela, F.; Divina, F.; García-Torres, M. A multi-GPU biclustering algorithm for binary datasets Journal Article In: Journal of Parallel and Distributed Computing, vol. 147, pp. 209-219, 2021, ISSN: 0743-7315. Abstract | Links | BibTeX | Tags: Biclustering, Big Data, CUDA, GPU @article{Lopez-Fernandez2020, Graphics Processing Units technology (GPU) and CUDA architecture are one of the most used options to adapt machine learning techniques to the huge amounts of complex data that are currently generated. Biclustering techniques are useful for discovering local patterns in datasets. Those of them that have been implemented to use GPU resources in parallel have improved their computational performance. However, this fact does not guarantee that they can successfully process large datasets. There are some important issues that must be taken into account, like the data transfers between CPU and GPU memory or the balanced distribution of workload between the GPU resources. In this paper, a GPU version of one of the fastest biclustering solutions, BiBit, is presented. This implementation, named gBiBit, has been designed to take full advantage of the computational resources offered by GPU devices. Either using a single GPU device or in its multi-GPU mode, gBiBit is able to process large binary datasets. The experimental results have shown that gBiBit improves the computational performance of BiBit, a CPU parallel version and an early GPU version, called ParBiBit and CUBiBit, respectively. gBiBit source code is available at https://github.com/aureliolfdez/gbibit. |
2017 |
Gómez-Vela, F.; Lopez-Fernandez, A.; Lagares, J. A.; Rodríguez-Baena, D.; Barranco, C. D.; García-Torres, M.; Divina, F. Bioinformatics from a Big Data Perspective: Meeting the Challenge Conference IWBBIO 2017: Bioinformatics and Biomedical Engineering, pp. 349-359, Springer International Publishing, Cham, 2017, ISBN: 978-3-319-56154-7. Abstract | Links | BibTeX | Tags: Big Data, Bioinformatics @conference{Gómez-Vela2017, Recently, the rising of the Big Data paradigm has had a great impact in several fields. Bioformatics is one such field. In fact, Bioinfomatics had to evolve in order to adapt to this phenomenon. The exponential increase of the biological information available, forced the researchers to find new solutions to handle these new challenges. |