He has been working on knowledge extraction since his Ph.D. thesis at Pablo de Olavide University. Nowadays, his research lines are related with the new databases technologies, big data programming techniques and sensors data processing for knowledge extraction.
Teaching
Computer Science (Information Systems), Pablo de Olavide University.
- Databases Design.
- Database Management.
- Final Degree Project.
Computer Science, Pablo de Olavide University.
- Cloud Computing.
- Final Master Project.
History and Digital Humanities, Pablo de Olavide University.
- Introduction to the theory and methodology of historical analysis and digital humanities.
- Final Master Project.
- External internships.
Related links
Publications
2024 |
A. Lopez-Fernandez; F. Gómez-Vela; Dulcenombre M. Saz-Navarro; F. M. Delgado; D. Rodríguez-Baena Optimized Python library for reconstruction of ensemble-based gene co-expression networks using multi-GPU Journal Article In: The Journal of Supercomputing, 2024, ISSN: 1573-0484. @article{Lopez-Fernandez2024b, Gene co-expression networks are valuable tools for discovering biologically relevant information within gene expression data. However, analysing large datasets presents challenges due to the identification of nonlinear gene–gene associations and the need to process an ever-growing number of gene pairs and their potential network connections. These challenges mean that some experiments are discarded because the techniques do not support these intense workloads. This paper presents pyEnGNet, a Python library that can generate gene co-expression networks in High-performance computing environments. To do this, pyEnGNet harnesses CPU and multi-GPU parallel computing resources, efficiently handling large datasets. These implementations have optimised memory management and processing, delivering timely results. We have used synthetic datasets to prove the runtime and intensive workload improvements. In addition, pyEnGNet was used in a real-life study of patients after allogeneic stem cell transplantation with invasive aspergillosis and was able to detect biological perspectives in the study. |
J. Figueroa-Martinez; Dulcenombre M. Saz-Navarro; A. Lopez-Fernandez; D. Rodríguez-Baena; F. Gómez-Vela Computational Ensemble Gene Co-Expression Networks for the Analysis of Cancer Biomarkers Journal Article In: Informatics, vol. 11, no. 2, pp. 14, 2024, ISSN: 2227-9709. @article{Figueroa-Martinez2024, Gene networks have become a powerful tool for the comprehensive examination of gene expression patterns. Thanks to these networks generated by means of inference algorithms, it is possible to study different biological processes and even identify new biomarkers for such diseases. These biomarkers are essential for the discovery of new treatments for genetic diseases such as cancer. In this work, we introduce an algorithm for genetic network inference based on an ensemble method that improves the robustness of the results by combining two main steps: first, the evaluation of the relationship between pairs of genes using three different co-expression measures, and, subsequently, a voting strategy. The utility of this approach was demonstrated by applying it to a human dataset encompassing breast and prostate cancer-associated stromal cells. Two gene networks were computed using microarray data, one for breast cancer and one for prostate cancer. The results obtained revealed, on the one hand, distinct stromal cell behaviors in breast and prostate cancer and, on the other hand, a list of potential biomarkers for both diseases. In the case of breast tumor, ST6GAL2, RIPOR3, COL5A1, and DEPDC7 were found, and in the case of prostate tumor, the genes were GATA6-AS1, ARFGEF3, PRR15L, and APBA2. These results demonstrate the usefulness of the ensemble method in the field of biomarker discovery. |
Dulcenombre M. Saz-Navarro; A. Lopez-Fernandez; F. Gómez-Vela; D. Rodríguez-Baena CyEnGNet—App: A new Cytoscape app for the reconstruction of large co-expression networks using an ensemble approach Journal Article In: SoftwareX, vol. 25, pp. 101634, 2024, ISSN: 2352-7110. @article{Saz-Navarro2024, The construction of gene co-expression networks is an essential tool in Bioinformatics for discovering useful biological knowledge. There are a multitude of methodologies related to the construction of this type of network, and one of them is EnGNet, which carries out a joint and greedy approach to the reconstruction of large gene coexpression networks. This work introduces CyEnGNet-App, a Cytoscape application designed to integrate and leverage the EnGNet algorithm. The application allows dynamic interaction and visualisation of gene networks and integration with other Cytoscape applications. CyEnGNet-App is a valuable addition to the field of Bioinformatics, improving the reconstruction of genetic networks and providing a more accessible and efficient user experience in Cytoscape. |
2021 |
A. Lopez-Fernandez; D. Rodríguez-Baena; F. Gómez-Vela; F. Divina; M. García-Torres A multi-GPU biclustering algorithm for binary datasets Journal Article In: Journal of Parallel and Distributed Computing, vol. 147, pp. 209-219, 2021, ISSN: 0743-7315. @article{Lopez-Fernandez2020, Graphics Processing Units technology (GPU) and CUDA architecture are one of the most used options to adapt machine learning techniques to the huge amounts of complex data that are currently generated. Biclustering techniques are useful for discovering local patterns in datasets. Those of them that have been implemented to use GPU resources in parallel have improved their computational performance. However, this fact does not guarantee that they can successfully process large datasets. There are some important issues that must be taken into account, like the data transfers between CPU and GPU memory or the balanced distribution of workload between the GPU resources. In this paper, a GPU version of one of the fastest biclustering solutions, BiBit, is presented. This implementation, named gBiBit, has been designed to take full advantage of the computational resources offered by GPU devices. Either using a single GPU device or in its multi-GPU mode, gBiBit is able to process large binary datasets. The experimental results have shown that gBiBit improves the computational performance of BiBit, a CPU parallel version and an early GPU version, called ParBiBit and CUBiBit, respectively. gBiBit source code is available at https://github.com/aureliolfdez/gbibit. |
2020 |
A. Lopez-Fernandez; D. Rodríguez-Baena; F. Gómez-Vela gMSR: A Multi-GPU Algorithm to Accelerate a Massive Validation of Biclusters Journal Article In: Electronics, vol. 9, no. 11, pp. 1782, 2020, ISSN: 2079-9292. @article{Lopez-Fernandez2020b, Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context. |
F. M. Delgado; F. Gómez-Vela; F. Divina; M. García-Torres; D. Rodríguez-Baena Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks Journal Article In: Genes, vol. 11, no. 7, pp. 831, 2020, ISSN: 2073-4425. @article{Delgado2020, Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E?HSC compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E?HSC mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E?HSC mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches. |
D. Rodríguez-Baena; F. Gómez-Vela; M. García-Torres; F. Divina; C. D. Barranco; N. Díaz-Díaz; M. Jiménez; G. Montalvo Identifying livestock behavior patterns based on accelerometer dataset Journal Article In: Journal of Computational Science, vol. 41, pp. 101076, 2020, ISSN: 1877-7503. @article{Rodríguez-Baena2020, In large livestock farming it would be beneficial to be able to automatically detect behaviors in animals. In fact, this would allow to estimate the health status of individuals, providing valuable insight to stock raisers. Traditionally this process has been carried out manually, relying only on the experience of the breeders. Such an approach is effective for a small number of individuals. However, in large breeding farms this may not represent the best approach, since, in this way, not all the animals can be effectively monitored all the time. Moreover, the traditional approach heavily rely on human experience, which cannot be always taken for granted. To this aim, in this paper, we propose a new method for automatically detecting activity and inactivity time periods of animals, as a behavior indicator of livestock. In order to do this, we collected data with sensors located in the body of the animals to be analyzed. In particular, the reliability of the method was tested with data collected on Iberian pigs and calves. Results confirm that the proposed method can help breeders in detecting activity and inactivity periods for large livestock farming. |
2019 |
F. Gómez-Vela; F. M. Delgado; D. Rodríguez-Baena; M. García-Torres; F. Divina Ensemble and Greedy Approach for the Reconstruction of Large Gene Co-Expression Networks Journal Article In: Entropy, vol. 21, no. 12, pp. 1139, 2019. @article{Gómez-Vela2019, Gene networks have become a powerful tool in the comprehensive analysis of gene expression. Due to the increasing amount of available data, computational methods for networks generation must deal with the so-called curse of dimensionality in the quest for the reliability of the obtained results. In this context, ensemble strategies have significantly improved the precision of results by combining different measures or methods. On the other hand, structure optimization techniques are also important in the reduction of the size of the networks, not only improving their topology but also keeping a positive prediction ratio. In this work, we present Ensemble and Greedy networks (EnGNet), a novel two-step method for gene networks inference. First, EnGNet uses an ensemble strategy for co-expression networks generation. Second, a greedy algorithm optimizes both the size and the topological features of the network. Not only do achieved results show that this method is able to obtain reliable networks, but also that it significantly improves topological features. Moreover, the usefulness of the method is proven by an application to a human dataset on post-traumatic stress disorder, revealing an innate immunity-mediated response to this pathology. These results are indicative of the method’s potential in the field of biomarkers discovery and characterization. |
2018 |
F. Gómez-Vela; D. Rodríguez-Baena; J. L. Vázquez-Noguera Structure Optimization for Large Gene Networks Based on Greedy Strategy Journal Article In: Computational and Mathematical Methods in Medicine, vol. 2018, 2018. @article{Gómez-Vela2018, In the last few years, gene networks have become one of most important tools to model biological processes. Among other utilities, these networks visually show biological relationships between genes. However, due to the large amount of the currently generated genetic data, their size has grown to the point of being unmanageable. To solve this problem, it is possible to use computational approaches, such as heuristics-based methods, to analyze and optimize gene network’s structure by pruning irrelevant relationships. In this paper we present a new method, called GeSOp, to optimize large gene network structures. The method is able to perform a considerably prune of the irrelevant relationships comprising the input network. To do so, the method is based on a greedy heuristic to obtain the most relevant subnetwork. The performance of our method was tested by means of two experiments on gene networks obtained from different organisms. The first experiment shows how GeSOp is able not only to carry out a significant reduction in the size of the network, but also to maintain the biological information ratio. In the second experiment, the ability to improve the biological indicators of the network is checked. Hence, the results presented show that GeSOp is a reliable method to optimize and improve the structure of large gene networks. |
A. Lopez-Fernandez; D. Rodríguez-Baena; F. Gómez-Vela; N. Díaz-Díaz BIGO: A web application to analyse gene enrichment analysis results Journal Article In: Computational biology and chemistry, vol. 76, pp. 169-178, 2018, ISSN: 1476-9271. @article{Lopez-Fernandez2018, Background and objective Gene enrichment tools enable the analysis of the relationships between genes with biological annotations stored in biological databases. The results obtained by these tools are usually difficult to analyse. Therefore, researchers require new tools with friendly user interfaces available on all types of devices and new methods to make the analysis of the results easier. Methods In this work, we present the BIGO Web tool. BIGO is a friendly Web tool to perform enrichment analyses of a collection of gene sets. On the basis of the obtained enrichment analysis results, BIGO combines the biological terms to organize them and graphically represents the relationships between gene sets to make the interpretations of the results easier. Results BIGO offers useful services that provide the opportunity to focus on a concrete subset of results by discarding too general biological terms or to obtain useful knowledge by means of the visual analysis of the functional connections between the sets of genes being analysed. Conclusions BIGO is a web tool with a novel and modern design that provides the possibility to improve the analysis tasks applied to gene enrichment results. |
2017 |
F. Gómez-Vela; A. Lopez-Fernandez; J. A. Lagares; D. Rodríguez-Baena; C. D. Barranco; M. García-Torres; F. Divina Bioinformatics from a Big Data Perspective: Meeting the Challenge Conference IWBBIO 2017: Bioinformatics and Biomedical Engineering, pp. 349-359, Springer International Publishing, Cham, 2017, ISBN: 978-3-319-56154-7. @conference{Gómez-Vela2017, Recently, the rising of the Big Data paradigm has had a great impact in several fields. Bioformatics is one such field. In fact, Bioinfomatics had to evolve in order to adapt to this phenomenon. The exponential increase of the biological information available, forced the researchers to find new solutions to handle these new challenges. |
2013 |
D. Rodríguez-Baena Extracting and validating biclusters from binary datasets Journal Article In: AI Communications, vol. 26, no. 4, pp. 417-418, 2013. @article{Rodríguez-Baena2013, This work proposes a novel algorithm to extract biclusters from binary datasets: the Bit-Pattern Biclustering Algorithm (BiBit). The selective search performed by BiBit, based on a very fast bits words processing technique, provides very satisfactory results in quality and computational cost. Besides, a new software tool, named CarGene (Characterization of Genes), that helps scientists to validate sets of genes using biological knowledge is introduced too. |
2011 |
F. Gómez-Vela; F. Martínez-Álvarez; C. D. Barranco; N. Díaz-Díaz; D. Rodríguez-Baena; J. Aguilar-Ruiz Pattern Recognition in Biological Time Series Journal Article In: Advances in Artificial Intelligence, pp. 164-172, 2011, ISBN: 978-3-642-25274-7. @article{Gómez-Vela2011b, Knowledge extraction from gene expression data has been one of the main challenges in the bioinformatics field during the last few years. In this context, a particular kind of data, data retrieved in a temporal basis (also known as time series), provide information about the way a gene can be expressed during time. This work presents an exhaustive analysis of last proposals in this area, particularly focusing on those proposals using non--supervised machine learning techniques (i.e. clustering, biclustering and regulatory networks) to find relevant patterns in gene expression. |
J. Aguilar-Ruiz; D. Rodríguez-Baena; N. Díaz-Díaz; I. A. Nepomuceno-Chamorro CarGene: Characterisation of sets of genes based on metabolic pathways analysis Journal Article In: International Journal of Data Mining and Bioinformatics, vol. 5, no. 5, pp. 558-573, 2011. @article{Aguilar-Ruiz2011, The great amount of biological information provides scientists with an incomparable framework for testing the results of new algorithms. Several tools have been developed for analysing gene-enrichment and most of them are Gene Ontology-based tools. We developed a Kyoto Encyclopedia of Genes and Genomes (Kegg)-based tool that provides a friendly graphical environment for analysing gene-enrichment. The tool integrates two statistical corrections and simultaneously analysing the information about many groups of genes in both visual and textual manner. We tested the usefulness of our approach on a previous analysis (Huttenshower et al.). Furthermore, our tool is freely available (http://www.upo.es/eps/bigs/cargene.html). |
D. Rodríguez-Baena; A. J. Pérez-Pulido; J. Aguilar-Ruiz A biclustering algorithm for extracting bit-patterns from binary datasets Journal Article In: Bioinformatics, vol. 27, no. 19, pp. 2738-2745, 2011, ISSN: 1367-4803. @article{Rodríguez-Baena2011, Motivation: Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations. Results: A novel approach to extracting biclusters from binary datasets, BiBit, is introduced here. The results obtained from different experiments with synthetic data reveal the excellent performance and the robustness of BiBit to density and size of input data. Also, BiBit is applied to a central nervous system embryonic tumor gene expression dataset to test the quality of the results. A novel gene expression preprocessing methodology, based on expression level layers, and the selective search performed by BiBit, based on a very fast bit-pattern processing technique, provide very satisfactory results in quality and computational cost. The power of biclustering in finding genes involved simultaneously in different cancer processes is also shown. Finally, a comparison with Bimax, one of the most cited binary biclustering algorithms, shows that BiBit is faster while providing essentially the same results. |
N. Díaz-Díaz; F. Gómez-Vela; D. Rodríguez-Baena; J. Aguilar-Ruiz Gene Regulatory Networks Validation Framework Based in KEGG Conference Hybrid Artificial Intelligent Systems, 2011, ISBN: 978-3-642-21222-2. @conference{Díaz-Díaz2011, In the last few years, DNA microarray technology has attained a very important role in biological and biomedical research. It enables analyzing the relations among thousands of genes simultaneously, generating huge amounts of data. The gene regulatory networks represent, in a graph data structure, genes or gene products and the functional relationships between them. These models have been fully used in Bioinformatics because they provide an easy way to understand gene expression regulation. |
2009 |
R. Alves; D. Rodríguez-Baena; J. Aguilar-Ruiz Gene association analysis: a survey of frequent pattern mining from gene expression data Journal Article In: Briefings in Bioinformatics, vol. 11, no. 2, pp. 210-224, 2009, ISSN: 1467-5463. @article{Alves2009, Establishing an association between variables is always of interest in genomic studies. Generation of DNA microarray gene expression data introduces a variety of data analysis issues not encountered in traditional molecular biology or medicine. Frequent pattern mining (FPM) has been applied successfully in business and scientific data for discovering interesting association patterns, and is becoming a promising strategy in microarray gene expression analysis. We review the most relevant FPM strategies, as well as surrounding main issues when devising efficient and practical methods for gene association analysis (GAA). We observed that, so far, scalability achieved by efficient methods does not imply biological soundness of the discovered association patterns, and vice versa. Ideally, GAA should employ a balanced mining model taking into account best practices employed by methods reviewed in this survey. Integrative approaches, in which biological knowledge plays an important role within the mining process, are becoming more reliable. |
2007 |
I. A. Nepomuceno-Chamorro; J. Aguilar-Ruiz; N. Díaz-Díaz; D. Rodríguez-Baena; J. García A Deterministic Model to Infer Gene Networks from Microarray Data Conference Intelligent Data Engineering and Automated Learning - IDEAL 2007, 2007, ISBN: 978-3-540-77226-2. @conference{Nepomuceno-Chamorro2007, Microarray experiments help researches to construct the structure of gene regulatory networks, i.e., networks representing relationships among different genes. Filter and knowledge extraction processes are necessary in order to handle the huge amount of data produced by microarray technologies. We propose regression trees techniques as a method to identify gene networks. Regression trees are a very useful technique to estimate the numerical values for the target outputs. They are very often more precise than linear regression models because they can adjust different linear regressions to separate areas of the search space. In our approach, we generate a single regression tree for each genes from a set of genes, taking as input the remaining genes, to finally build a graph from all the relationships among output and input genes. In this paper, we will simplify the approach by setting an only seed, the gene ARN1, and building the graph around it. The final model might gives some clues to understand the dynamics, the regulation or the topology of the gene network from one (or several) seeds, since it gathers relevant genes with accurate connections. The performance of our approach is experimentally tested on the yeast Saccharomyces cerevisiae dataset (Rosetta compendium). |
D. Rodríguez-Baena; N. Díaz-Díaz; J. Aguilar-Ruiz; I. A. Nepomuceno-Chamorro Discovering alpha–Patterns from Gene Expression Data Conference Intelligent Data Engineering and Automated Learning - IDEAL 2007, 2007, ISBN: 978-3-540-77226-2. @conference{Rodríguez-Baena2007, The biclustering techniques have the purpose of finding subsets of genes that show similar activity patterns under a subset of conditions. In this paper we characterize a specific type of pattern, that we have called ?–pattern, and present an approach that consists in a new biclustering algorithm specifically designed to find ?–patterns, in which the gene expression values evolve across the experimental conditions showing a similar behavior inside a band that ranges from 0 up to a pre–defined threshold called ?. The ? value guarantees the co–expression among genes. We have tested our method on the Yeast dataset and compared the results to the biclustering algorithms of Cheng & Church (2000) and Aguilar & Divina (2005). Results show that the algorithm finds interesting biclusters, grouping genes with similar behaviors and maintaining a very low mean squared residue. |
2006 |
N. Díaz-Díaz; D. Rodríguez-Baena; I. A. Nepomuceno-Chamorro; J. Aguilar-Ruiz Neighborhood-Based Clustering of Gene-Gene Interactions Conference Intelligent Data Engineering and Automated Learning -- IDEAL 2006, 2006, ISBN: 978-3-540-45487-8. @conference{Díaz-Díaz2006, In this work, we propose a new greedy clustering algorithm to identify groups of related genes. Clustering algorithms analyze genes in order to group those with similar behavior. Instead, our approach groups pairs of genes that present similar positive and/or negative interactions. Our approach presents some interesting properties. For instance, the user can specify how the range of each gene is going to be segmented (labels). Some of these will mean expressed or inhibited (depending on the gradation). From all the label combinations a function transforms each pair of labels into another one, that identifies the type of interaction. From these pairs of genes and their interactions we build clusters in a greedy, iterative fashion, as two pairs of genes will be similar if they have the same amount of relevant interactions. Initial two–genes clusters grow iteratively based on their neighborhood until the set of clusters does not change. The algorithm allows the researcher to modify all the criteria: discretization mapping function, gene–gene mapping function and filtering function, and provides much flexibility to obtain clusters based on the level of precision needed. The performance of our approach is experimentally tested on the yeast dataset. The final number of clusters is low and genes within show a significant level of cohesion, as it is shown graphically in the experiments. |
2003 |
J. Aguilar-Ruiz; D. Rodríguez-Baena; P. R. Cohen; J. C. Riquelme Clustering Main Concepts from e-Mails Conference Current Topics in Artificial Intelligence, 2003, ISBN: 978-3-540-25945-9. @conference{Aguilar-Ruiz2003, E–mail is one of the most common ways to communicate, assuming, in some cases, up to 75% of a company’s communication, in which every employee spends about 90 minutes a day in e–mail tasks such as filing and deleting. This paper deals with the generation of clusters of relevant words from E–mail texts. Our approach consists of the application of text mining techniques and, later, data mining techniques, to obtain related concepts extracted from sent and received messages. We have developed a new clustering algorithm based on neighborhood, which takes into account similarity values among words obtained in the text mining phase. The potential of these applications is enormous and only a few companies, mainly large organizations, have invested in this project so far, taking advantage of employees’s knowledge in future decisions. |