Publications
2024 |
Insfrán-Coronel, D. R.; Enrique-Sánchez, E. M.; Beck, Federico; Lopez-Fernandez, A.; García-Torres, M. Analysis of School Dropout Rate in Paraguay Using a Machine Learning Approach Conference International Joint Conferences: 15th International Conference on European Transnational Education (ICEUTE 2024), Springer Nature Switzerland, 2024, ISBN: 978-3-031-75016-8. Abstract | Links | BibTeX | Tags: @conference{Insfrán-Coronel2024, This study investigates the school dropout rates in Paraguay, focusing on the transition from ninth grade to the first year of secondary school in the Concepción department. Using an extract, transform, and load (ETL) process, data from the Paraguayan Ministry of Education and Science and the National Institute of Statistics were analyzed. The research employs clustering techniques, particularly K-means, to identify patterns and risk profiles among students. The findings highlight the significant impact of socio-economic factors, such as poverty and child labor, on school dropout rates. These insights aim to inform targeted interventions to improve educational outcomes and reduce dropout rates in Paraguay. |
Vázquez-Noguera, S.; Martínez, F.; Becerra-Alonso, D.; Lopez-Fernandez, A.; Lopez-Cobo, I.; Sosa, P. Effects of Language of Instruction in Higher Education Conference International Joint Conferences: 15th International Conference on European Transnational Education (ICEUTE 2024) , Springer Nature Switzerland, 2024, ISBN: 978-3-031-75016-8. Abstract | Links | BibTeX | Tags: @conference{Vázquez-Noguera2024, In higher education, educational centers are paying attention to student mobility and internationalization to increase visibility. This increase in the intake of international students has resulted in not only improved global competitiveness, but also economic benefits. However, in order to develop education in a language different from the mother tongue, it requires high-level proficiency in such a language. Therefore, in this work, we explore the impact of the language of instruction on business administration and management grade from Universidad Loyola (Spain). We analyze and compare the student profile according to the language of instruction using a clustering approach. The results suggest that students who receive instruction in a foreign language achieve better performance than those who receive it in their mother tongue. However, the number of students who decide to study in a foreign language is smaller. |
Torres-Báez, J. A.; Torres-Báez, J. B.; Lopez-Fernandez, A.; Gomez-Vela, F.; Beck, Federico J Exploring Educational Trends: Specializations in Secondary Education in Paraguay from 2018 to 2021 Conference International Joint Conferences: 15th International Conference on European Transnational Education (ICEUTE 2024) , Springer Nature Switzerland, 2024, ISBN: 978-3-031-75016-8. Abstract | Links | BibTeX | Tags: @conference{Torres-Báez2024, Paraguay’s education system has undergone significant changes, particularly at the secondary level, introducing new specializations and teaching methods. While this diversity offers students unique opportunities, it also presents challenges in selecting a suitable specialty. Examining the variety and demand of specializations provides insights into educational trends and job market needs. Analyzing gender distribution across fields can help address disparities. Additionally, factors like accessibility, curriculum variety, overage students, and indigenous inclusion must be considered. Advanced methods like exploratory data analysis (EDA) are essential for understanding these complexities. This study introduces a tool for EDA and comprehensive investigation of enrollment data, aiming to provide valuable insights for students. The importance of EDA in educational research is emphasized, along with advancements … |
Lopez-Fernandez, A.; Gallejones-Eskubi, J.; Saz-Navarro, Dulcenombre M.; Gómez-Vela, F. Breast Cancer Biomarker Analysis Using Gene Co-expression Networks Conference IWBBIO 2024: International Work-Conference on Bioinformatics and Biomedical Engineering , Springer Nature Switzerland, 2024, ISBN: 978-3-031-64636-2. Abstract | Links | BibTeX | Tags: Bioinformatics, Biomarkers, Gene co-expression network @conference{Lopez-Fernandez2024c, Gene co-expression networks have emerged as a robust tool for conducting comprehensive analyses of gene expression patterns. These networks, constructed through inference algorithms, facilitate the exploration of various biological processes and enable the identification of novel biomarkers from which to explore new lines of disease research. This work found that breast cancer stromal cells are strongly dysregulated in genes related to modifications in cellular structures that hold stromal tissue cells together, inflammatory responses, and molecules implicated in immune system regulation. Finally, ANAPC11, LRFN5, COL8A2, TEX11, DOCK9, CPLX1, LONP2, and LAT2 biomarkers were suggested in the context of stromal breast tumors. |
Lopez-Fernandez, A.; Gómez-Vela, F.; Saz-Navarro, Dulcenombre M.; Delgado, F. M.; Rodríguez-Baena, D. Optimized Python library for reconstruction of ensemble-based gene co-expression networks using multi-GPU Journal Article In: The Journal of Supercomputing, 2024, ISSN: 1573-0484. Abstract | Links | BibTeX | Tags: Big Data, Bioinformatics, Data Mining, Gene co-expression network, GPU, High-Performance Computing @article{Lopez-Fernandez2024b, Gene co-expression networks are valuable tools for discovering biologically relevant information within gene expression data. However, analysing large datasets presents challenges due to the identification of nonlinear gene–gene associations and the need to process an ever-growing number of gene pairs and their potential network connections. These challenges mean that some experiments are discarded because the techniques do not support these intense workloads. This paper presents pyEnGNet, a Python library that can generate gene co-expression networks in High-performance computing environments. To do this, pyEnGNet harnesses CPU and multi-GPU parallel computing resources, efficiently handling large datasets. These implementations have optimised memory management and processing, delivering timely results. We have used synthetic datasets to prove the runtime and intensive workload improvements. In addition, pyEnGNet was used in a real-life study of patients after allogeneic stem cell transplantation with invasive aspergillosis and was able to detect biological perspectives in the study. |
Figueroa-Martinez, J.; Saz-Navarro, Dulcenombre M.; Lopez-Fernandez, A.; Rodríguez-Baena, D.; Gómez-Vela, F. Computational Ensemble Gene Co-Expression Networks for the Analysis of Cancer Biomarkers Journal Article In: Informatics, vol. 11, no. 2, pp. 14, 2024, ISSN: 2227-9709. Abstract | Links | BibTeX | Tags: Bioinformatics, Biomarkers, Breast cancer, Gene co-expression network, Prostate cancer, Stromal tissue @article{Figueroa-Martinez2024, Gene networks have become a powerful tool for the comprehensive examination of gene expression patterns. Thanks to these networks generated by means of inference algorithms, it is possible to study different biological processes and even identify new biomarkers for such diseases. These biomarkers are essential for the discovery of new treatments for genetic diseases such as cancer. In this work, we introduce an algorithm for genetic network inference based on an ensemble method that improves the robustness of the results by combining two main steps: first, the evaluation of the relationship between pairs of genes using three different co-expression measures, and, subsequently, a voting strategy. The utility of this approach was demonstrated by applying it to a human dataset encompassing breast and prostate cancer-associated stromal cells. Two gene networks were computed using microarray data, one for breast cancer and one for prostate cancer. The results obtained revealed, on the one hand, distinct stromal cell behaviors in breast and prostate cancer and, on the other hand, a list of potential biomarkers for both diseases. In the case of breast tumor, ST6GAL2, RIPOR3, COL5A1, and DEPDC7 were found, and in the case of prostate tumor, the genes were GATA6-AS1, ARFGEF3, PRR15L, and APBA2. These results demonstrate the usefulness of the ensemble method in the field of biomarker discovery. |
Lopez-Fernandez, A.; Gómez-Vela, F.; González-Domínguez, J.; Bidare-Divakarachari, P. bioScience: A new python science library for high-performance computing bioinformatics analytics Journal Article In: SoftwareX, vol. 26, pp. 101666, 2024, ISSN: 2352-7110. Abstract | Links | BibTeX | Tags: Bioinformatics, Data analysis, Data Mining, Data science, High-Performance Computing @article{Lopez-Fernandez2024, BioScience is an advanced Python library designed to satisfy the growing data analysis needs in the field of bioinformatics by leveraging High-Performance Computing (HPC). This library encompasses a vast multitude of functionalities, from loading specialized gene expression datasets (microarrays, RNA-Seq, etc.) to preprocessing techniques and data mining algorithms suitable for this type of datasets. BioScience is distinguished by its capacity to manage large amounts of biological data, providing users with efficient and scalable tools for the analysis of genomic and transcriptomic data through the use of parallel architectures for clusters composed of CPUs and GPUs. |
Saz-Navarro, Dulcenombre M.; Lopez-Fernandez, A.; Gómez-Vela, F.; Rodríguez-Baena, D. CyEnGNet—App: A new Cytoscape app for the reconstruction of large co-expression networks using an ensemble approach Journal Article In: SoftwareX, vol. 25, pp. 101634, 2024, ISSN: 2352-7110. Abstract | Links | BibTeX | Tags: Bioinformatics, Cytoscape, Gene networks, Network analysis, Visualisation @article{Saz-Navarro2024, The construction of gene co-expression networks is an essential tool in Bioinformatics for discovering useful biological knowledge. There are a multitude of methodologies related to the construction of this type of network, and one of them is EnGNet, which carries out a joint and greedy approach to the reconstruction of large gene coexpression networks. This work introduces CyEnGNet-App, a Cytoscape application designed to integrate and leverage the EnGNet algorithm. The application allows dynamic interaction and visualisation of gene networks and integration with other Cytoscape applications. CyEnGNet-App is a valuable addition to the field of Bioinformatics, improving the reconstruction of genetic networks and providing a more accessible and efficient user experience in Cytoscape. |
2021 |
Lopez-Fernandez, A.; Rodríguez-Baena, D.; Gómez-Vela, F.; Divina, F.; García-Torres, M. A multi-GPU biclustering algorithm for binary datasets Journal Article In: Journal of Parallel and Distributed Computing, vol. 147, pp. 209-219, 2021, ISSN: 0743-7315. Abstract | Links | BibTeX | Tags: Biclustering, Big Data, CUDA, GPU @article{Lopez-Fernandez2020, Graphics Processing Units technology (GPU) and CUDA architecture are one of the most used options to adapt machine learning techniques to the huge amounts of complex data that are currently generated. Biclustering techniques are useful for discovering local patterns in datasets. Those of them that have been implemented to use GPU resources in parallel have improved their computational performance. However, this fact does not guarantee that they can successfully process large datasets. There are some important issues that must be taken into account, like the data transfers between CPU and GPU memory or the balanced distribution of workload between the GPU resources. In this paper, a GPU version of one of the fastest biclustering solutions, BiBit, is presented. This implementation, named gBiBit, has been designed to take full advantage of the computational resources offered by GPU devices. Either using a single GPU device or in its multi-GPU mode, gBiBit is able to process large binary datasets. The experimental results have shown that gBiBit improves the computational performance of BiBit, a CPU parallel version and an early GPU version, called ParBiBit and CUBiBit, respectively. gBiBit source code is available at https://github.com/aureliolfdez/gbibit. |
2020 |
Lopez-Fernandez, A.; Rodríguez-Baena, D.; Gómez-Vela, F. gMSR: A Multi-GPU Algorithm to Accelerate a Massive Validation of Biclusters Journal Article In: Electronics, vol. 9, no. 11, pp. 1782, 2020, ISSN: 2079-9292. Abstract | Links | BibTeX | Tags: Biclustering, Biclustering validation, CUDA, GPU, MSR @article{Lopez-Fernandez2020b, Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context. |
Delgado, F. M.; Gómez-Vela, F.; Divina, F.; García-Torres, M.; Rodríguez-Baena, D. Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks Journal Article In: Genes, vol. 11, no. 7, pp. 831, 2020, ISSN: 2073-4425. Abstract | Links | BibTeX | Tags: Data Mining, Gene Network, murine coronavirus, Systems biology, Viral infection @article{Delgado2020, Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E?HSC compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E?HSC mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E?HSC mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches. |
Aram, B.; Lopez-Fernandez, A.; Muñiz-Amian, D. The integration of heterogeneous information from diverse disciplines regarding persons and goods Journal Article In: Digital Scholarship in the Humanities, 2020. Abstract | Links | BibTeX | Tags: Databases, Digital Humanities @article{Aram2020, This article presents a relational database capable of integrating data from a variety of types of written sources as well as material remains. In response to historical research questions, information from such diverse sources as documentary, bioanthropological, isotopic, and DNA analyses has been assessed, homogenized, and situated in time and space. Multidisciplinary ontologies offer complementary and integrated perspectives regarding persons and goods. While responding to specific research questions about the impact of globalization on the isthmus of Panama during the sixteenth and seventeenth centuries, the data model and user interface promote the ongoing interrogation of diverse information about complex, changing societies. To this end, the application designed makes it possible to search, consult, and download data that researchers have contributed from anywhere in the world. |
Rodríguez-Baena, D.; Gómez-Vela, F.; García-Torres, M.; Divina, F.; Barranco, C. D.; Díaz-Díaz, N.; Jiménez, M.; Montalvo, G. Identifying livestock behavior patterns based on accelerometer dataset Journal Article In: Journal of Computational Science, vol. 41, pp. 101076, 2020, ISSN: 1877-7503. Abstract | Links | BibTeX | Tags: Livestock activity, Pattern recognition, Time series processing @article{Rodríguez-Baena2020, In large livestock farming it would be beneficial to be able to automatically detect behaviors in animals. In fact, this would allow to estimate the health status of individuals, providing valuable insight to stock raisers. Traditionally this process has been carried out manually, relying only on the experience of the breeders. Such an approach is effective for a small number of individuals. However, in large breeding farms this may not represent the best approach, since, in this way, not all the animals can be effectively monitored all the time. Moreover, the traditional approach heavily rely on human experience, which cannot be always taken for granted. To this aim, in this paper, we propose a new method for automatically detecting activity and inactivity time periods of animals, as a behavior indicator of livestock. In order to do this, we collected data with sensors located in the body of the animals to be analyzed. In particular, the reliability of the method was tested with data collected on Iberian pigs and calves. Results confirm that the proposed method can help breeders in detecting activity and inactivity periods for large livestock farming. |
2019 |
Gómez-Vela, F.; Delgado, F. M.; Rodríguez-Baena, D.; García-Torres, M.; Divina, F. Ensemble and Greedy Approach for the Reconstruction of Large Gene Co-Expression Networks Journal Article In: Entropy, vol. 21, no. 12, pp. 1139, 2019. Abstract | Links | BibTeX | Tags: Ensemble networks, Gene Network @article{Gómez-Vela2019, Gene networks have become a powerful tool in the comprehensive analysis of gene expression. Due to the increasing amount of available data, computational methods for networks generation must deal with the so-called curse of dimensionality in the quest for the reliability of the obtained results. In this context, ensemble strategies have significantly improved the precision of results by combining different measures or methods. On the other hand, structure optimization techniques are also important in the reduction of the size of the networks, not only improving their topology but also keeping a positive prediction ratio. In this work, we present Ensemble and Greedy networks (EnGNet), a novel two-step method for gene networks inference. First, EnGNet uses an ensemble strategy for co-expression networks generation. Second, a greedy algorithm optimizes both the size and the topological features of the network. Not only do achieved results show that this method is able to obtain reliable networks, but also that it significantly improves topological features. Moreover, the usefulness of the method is proven by an application to a human dataset on post-traumatic stress disorder, revealing an innate immunity-mediated response to this pathology. These results are indicative of the method’s potential in the field of biomarkers discovery and characterization. |
Díaz-Montaña, J. J.; Díaz-Díaz, N.; Barranco, C. D.; Ponzoni, I. Development and use of a Cytoscape app for GRNCOP2 Journal Article In: Computer Methods and Programs in Biomedicine, vol. 177, pp. 211-218, 2019, ISSN: 0169-2607. Abstract | Links | BibTeX | Tags: Cytoscape @article{Díaz-Montaña2019, Background and Objective: Gene regulatory networks (GRNs) are essential for understanding most molecular processes. In this context, the so-called model-free approaches have an advantage modeling the complex topologies behind these dynamic molecular networks, since most GRNs are difficult to map correctly by any other mathematical model. Abstract model-free approaches, also known as rule-based extraction methods, offer valuable benefits when performing data-driven analysis; such as requiring the least amount of data and simplifying the inference of large models at a faster analysis speed. In particular, GRNCOP2 is a combinatorial optimization method with an adaptive criterion for the discretization of gene expression data and high performance, in contrast to other rule-based extraction methods for discovering GRNs. However, the analysis of the large relational structures of the networks inferred by GRNCOP2 requires the support of effective tools for interactive network visualization and topological analysis of the extracted associations. This need motivated the possibility of integrating GRNCOP2 in the Cytoscape ecosystem in order to benefit from Cytoscapes core functionality, as well as all the other apps in its ecosystem. Methods: In this paper, we introduce the implementation of a GRNCOP2 Cytoscape app. This incorporation to Cytoscape platform includes new functionality for GRN visualizations, dynamic user-interaction and integration with other apps for topological analysis of the networks. Results: In order to demonstrate the usefulness of integrating GRNCOP2 in Cytoscape, the new app was used to tackle a novel use case for GRNCOP2: the analysis of crosstalk between pathways. In this regard, datasets associated with Alzheimer’s disease (AD) were analyzed using GRNCOP2 app and other apps of the Cytoscape ecosystem by performing a topological analysis of the AD progression and its synchronization with the Ubiquitin Mediated Proteolysis pathway. Finally, the biological relevance of the findings achieved by this new app were evaluated by searching for evidence in the literature. Conclusions: The proposed crosstalk analysis with the new GRNCOP2 app focused on assessing the phase of the Alzheimer’s disease progression where the coordination with the Ubiquitin Mediated Proteolysis pathway increase, and identifying the genes that explain the signalling between these cellular processes. Both questions were explored by topological contrastive analysis of the GRNs generated for the GRNCOP2 app, where several facilities of Cytoscape were exploited. The topological patterns inferred by this new App have been consistent with biological evidence reported in the scientic literature, illustrating the effectiveness of using this new GRNCOP2 App in pathway analysis. Availability: The GRNCOP2 App is freely available at the official Cytoscape app store: http://apps.cytoscape.org/apps/grncop2 |
García-Torres, M.; Becerra-Alonso, D.; Gómez-Vela, F.; Divina, F.; López-Cobo, I.; Martínez-Álvarez, F. Analysis of Student Achievement Scores: A Machine Learning Approach Conference International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019), 2019, ISBN: 978-3-030-20005-3. Abstract | Links | BibTeX | Tags: Data Mining @conference{García-Torres2019, Educational Data Mining (EDM) is an emerging discipline of increasing interest due to several factors, such as the adoption of learning management systems in education environment. In this work we analyze the predictive power of continuous evaluation activities with respect the overall student performance in physics course at Universidad Loyola Andaluc{'i}{i}a, in Seville, Spain. Such data was collected during the fall semester of 2018 and we applied several classification algorithms, as well as feature selection strategies. Results suggest that several activities are not really relevant and, so, machine learning techniques may be helpful to design new relevant and non-redundant activities for enhancing student knowledge acquisition in physics course. These results may be extrapolated to other courses. |
Delgado, F. M.; Gómez-Vela, F. Computational methods for Gene Regulatory Networks reconstruction and analysis: A review Journal Article In: Artificial Intelligence in Medicine, vol. 95, pp. 133-145, 2019, ISSN: 0933-3657. Abstract | Links | BibTeX | Tags: Gene Network, Systems biology @article{Delgado2019, In the recent years, the vast amount of genetic information generated by new-generation approaches, have led to the need of new data handling methods. The integrative analysis of diverse-nature gene information could provide a much-sought overview to study complex biological systems and processes. In this sense, Gene Regulatory Networks (GRN) arise as an increasingly-promising tool for the modelling and analysis of biological processes. This review is an attempt to summarize the state of the art in the field of GRNs. Essential points in the field are addressed, thereof: (a) the type of data used for network generation, (b) machine learning methods and tools used for network generation, (c) model optimization and (d) computational approaches used for network validation. This survey is intended to provide an overview of the subject for readers to improve their knowledge in the field of GRN for future research. |
2018 |
Medina, J. M.; Barranco, C. D.; Pons, O. Indexing techniques to improve the performance of necessity-based fuzzy queries using classical indexing of RDBMS Journal Article In: Fuzzy Sets and Systems, vol. 351, pp. 90-107, 2018, ISSN: 0165-0114. Abstract | Links | BibTeX | Tags: Fuzzy databases @article{Medina2017, It is widely known that the most effective way to implement a fuzzy database is to use a classical Relational Database Management System (RDBMS) as the basis. All these systems provide several kinds of indexing methods to improve the execution time of classical queries, but they are useless when directly applied to fuzzy queries. For this reason, in this work we propose and evaluate several fuzzy indexing techniques implemented over the indexing techniques available on classical RDBMS in order to enhance flexible queries when based on the necessity measure. As the results show, the best evaluated fuzzy indexing techniques can be implemented on top of classical RDBMS. |
Cravero, F.; Schustik, S.; Martínez, M. J.; Barranco, C. D.; Díaz, M. F.; Ponzoni, I. Practical Applications of Computational Biology and Bioinformatics, 12th International Conference, 2018, ISBN: 978-3-319-98702-6. Abstract | Links | BibTeX | Tags: Artificial intelligence, Feature selection @conference{Cravero2018, QSPR (Quantitative Structure-Property Relationship) models proposed in Polymer Informatics typically use reduced computational representations of polymers for avoiding the complex issues related with the polydispersion of these industrial materials. In this work, the aim is to assess the effect of this oversimplification in the modelling decisions and to analyze strategies for addressing alternative characterizations of the materials that capture, at least partially, the polydispersion phenomenon. In particular, a cheminformatic study for estimating a tensile property of polymers is presented here. Four different computational representations are analyzed in combination with several machine learning approaches for selecting the most relevant molecular descriptors associated with the target property and for learning the corresponding QSPR models. The obtained results give insight about the limitations of using oversimplified representations of polymers and contribute with alternative strategies for achieving more realistic models. |
Medina, J. M.; Barranco, C. D.; Pons, O. 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2018. Abstract | Links | BibTeX | Tags: Fuzzy databases @conference{Medina2018, In this paper we pose the implementation of the most efficient indexing techniques built on an Fuzzy Object Relational Database Management System, according to the current literature, using the underlying Object Relational Database Management System extension mechanisms, and study and compare their technical feasibility and performance on a real System. Results show that these techniques are very effective and can improve the query execution time in several orders of magnitude with respect to sequential retrieving, being the BT the simpler in terms of implementation feasibility. |
Gómez-Vela, F.; Rodríguez-Baena, D.; Vázquez-Noguera, J. L. Structure Optimization for Large Gene Networks Based on Greedy Strategy Journal Article In: Computational and Mathematical Methods in Medicine, vol. 2018, 2018. Abstract | Links | BibTeX | Tags: Gene Network, Soft Computing @article{Gómez-Vela2018, In the last few years, gene networks have become one of most important tools to model biological processes. Among other utilities, these networks visually show biological relationships between genes. However, due to the large amount of the currently generated genetic data, their size has grown to the point of being unmanageable. To solve this problem, it is possible to use computational approaches, such as heuristics-based methods, to analyze and optimize gene network’s structure by pruning irrelevant relationships. In this paper we present a new method, called GeSOp, to optimize large gene network structures. The method is able to perform a considerably prune of the irrelevant relationships comprising the input network. To do so, the method is based on a greedy heuristic to obtain the most relevant subnetwork. The performance of our method was tested by means of two experiments on gene networks obtained from different organisms. The first experiment shows how GeSOp is able not only to carry out a significant reduction in the size of the network, but also to maintain the biological information ratio. In the second experiment, the ability to improve the biological indicators of the network is checked. Hence, the results presented show that GeSOp is a reliable method to optimize and improve the structure of large gene networks. |
Lopez-Fernandez, A.; Rodríguez-Baena, D.; Gómez-Vela, F.; Díaz-Díaz, N. BIGO: A web application to analyse gene enrichment analysis results Journal Article In: Computational biology and chemistry, vol. 76, pp. 169-178, 2018, ISSN: 1476-9271. Abstract | Links | BibTeX | Tags: Bioinformatics, Biological validation, Gene enrichment analysis @article{Lopez-Fernandez2018, Background and objective Gene enrichment tools enable the analysis of the relationships between genes with biological annotations stored in biological databases. The results obtained by these tools are usually difficult to analyse. Therefore, researchers require new tools with friendly user interfaces available on all types of devices and new methods to make the analysis of the results easier. Methods In this work, we present the BIGO Web tool. BIGO is a friendly Web tool to perform enrichment analyses of a collection of gene sets. On the basis of the obtained enrichment analysis results, BIGO combines the biological terms to organize them and graphically represents the relationships between gene sets to make the interpretations of the results easier. Results BIGO offers useful services that provide the opportunity to focus on a concrete subset of results by discarding too general biological terms or to obtain useful knowledge by means of the visual analysis of the functional connections between the sets of genes being analysed. Conclusions BIGO is a web tool with a novel and modern design that provides the possibility to improve the analysis tasks applied to gene enrichment results. |
Díaz-Montaña, J. J.; Gómez-Vela, F.; Díaz-Díaz, N. GNC–app: A new Cytoscape app to rate gene networks biological coherence using gene–gene indirect relationships Journal Article In: Biosystems, vol. 166, pp. 61-65, 2018, ISSN: 0303-2647. Abstract | Links | BibTeX | Tags: Cytoscape, Gene Network @article{Díaz-Montaña2018, Motivation Gene networks are currently considered a powerful tool to model biological processes in the Bioinformatics field. A number of approaches to infer gene networks and various software tools to handle them in a visual simplified way have been developed recently. However, there is still a need to assess the inferred networks in order to prove their relevance. Results In this paper, we present the new GNC-app for Cytoscape. GNC-app implements the GNC methodology for assessing the biological coherence of gene association networks and integrates it into Cytoscape. Implemented de novo, GNC-app significantly improves the performance of the original algorithm in order to be able to analyse large gene networks more efficiently. It has also been integrated in Cytoscape to increase the tool accessibility for non-technical users and facilitate the visual analysis of the results. This integration allows the user to analyse not only the global biological coherence of the network, but also the biological coherence at the gene–gene relationship level. It also allows the user to leverage Cytoscape capabilities as well as its rich ecosystem of apps to perform further analyses and visualizations of the network using such data. Availability The GNC-app is freely available at the official Cytoscape app store: http://apps.cytoscape.org/apps/gnc. |
Martínez-García, P. M.; García-Torres, M.; Divina, F.; Gómez-Vela, F.; Cortés-Ledesma, F. Applications of Evolutionary Computation, 2018, ISBN: 978-3-319-77538-8. Abstract | Links | BibTeX | Tags: Binding sites, Classification, Feature selection @conference{Martínez-García2018, Topoisomerases are proteins that regulate the topology of DNA by introducing transient breaks to relax supercoiling. In this paper we focus our attention on Topoisomerases 2 (TOP2), which generate double-strand DNA breaks that, if inefficiently repaired, can seriously compromise genomic stability. It is then important to gain insights on the molecular processes involved in TOP2-DNA binding. In order to do this, we collected genomic and epigenomic information from publicly available high-throughput sequencing projects and systematically quantified them within experimentally measured TOP2 binding sites. We then applied feature selection techniques in order to both increase the performance of classification and to gain insight on the particular properties that can be of biological relevance. Results obtained allowed us to identify a core set of predictive chromatin features that faithfully explain TOP2 binding. |
2017 |
Medina, J. M.; Barranco, C. D.; Pons, O.; Sanchez, D. 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2017, ISSN: 1558-4739. Abstract | Links | BibTeX | Tags: Fuzzy databases @conference{Medina2017b, An effective way to implement a fuzzy database is on top of a classical Relational Database Management Systems (RDBMS). In this sense, we have proposed a Fuzzy Object Relational Database Management System (FORDBMS) [1] built on top of Oracle ® RDBMS. To enhance the performance of queries based on possibility, we have carried out a study [2] to adapt indexing techniques available in classical RDBMS to the fuzzy retrieval. This paper shows the implementation of the best of these indexing techniques on our FORDBMS and evaluates and compares their performance. The results show that the best of these techniques enhance query execution time in several orders of magnitude with respect to sequential retrieving. |
Gómez-Vela, F.; Lopez-Fernandez, A.; Lagares, J. A.; Rodríguez-Baena, D.; Barranco, C. D.; García-Torres, M.; Divina, F. Bioinformatics from a Big Data Perspective: Meeting the Challenge Conference IWBBIO 2017: Bioinformatics and Biomedical Engineering, pp. 349-359, Springer International Publishing, Cham, 2017, ISBN: 978-3-319-56154-7. Abstract | Links | BibTeX | Tags: Big Data, Bioinformatics @conference{Gómez-Vela2017, Recently, the rising of the Big Data paradigm has had a great impact in several fields. Bioformatics is one such field. In fact, Bioinfomatics had to evolve in order to adapt to this phenomenon. The exponential increase of the biological information available, forced the researchers to find new solutions to handle these new challenges. |
Díaz-Montaña, J. J.; Díaz-Díaz, N.; Gómez-Vela, F. GFD-Net: A novel semantic similarity methodology for the analysis of gene networks Journal Article In: Journal of Biomedical Informatics, vol. 68, pp. 71-82, 2017, ISSN: 1532-0464. Abstract | Links | BibTeX | Tags: Gene Network @article{Díaz-Montaña2017, Since the popularization of biological network inference methods, it has become crucial to create methods to validate the resulting models. Here we present GFD-Net, the first methodology that applies the concept of semantic similarity to gene network analysis. GFD-Net combines the concept of semantic similarity with the use of gene network topology to analyze the functional dissimilarity of gene networks based on Gene Ontology (GO). The main innovation of GFD-Net lies in the way that semantic similarity is used to analyze gene networks taking into account the network topology. GFD-Net selects a functionality for each gene (specified by a GO term), weights each edge according to the dissimilarity between the nodes at its ends and calculates a quantitative measure of the network functional dissimilarity, i.e. a quantitative value of the degree of dissimilarity between the connected genes. The robustness of GFD-Net as a gene network validation tool was demonstrated by performing a ROC analysis on several network repositories. Furthermore, a well-known network was analyzed showing that GFD-Net can also be used to infer knowledge. The relevance of GFD-Net becomes more evident in Section “GFD-Net applied to the study of human diseases†where an example of how GFD-Net can be applied to the study of human diseases is presented. GFD-Net is available as an open-source Cytoscape app which offers a user-friendly interface to configure and execute the algorithm as well as the ability to visualize and interact with the results(http://apps.cytoscape.org/apps/gfdnet). |
2016 |
Medina, J. M.; Barranco, C. D.; Pons, O. Evaluation of Indexing Strategies for Possibilistic Queries Based on Indexing Techniques Available in Traditional RDBMS Journal Article In: International Journal of Intelligent Systems, vol. 31, no. 12, pp. 1135-1165, 2016. Abstract | Links | BibTeX | Tags: Fuzzy databases @article{Medina2016, A common way to implement a fuzzy database is on top of a classical relational database management systems (RDBMS). Given that almost all RDBMS provide indexing mechanisms to enhance classical query processing performance, finding ways to use these mechanisms to enhance the performance of flexible query processing is of enormous interest. This work proposes and evaluates a set of indexing strategies, implemented exclusively on top of classical RDBMS indexing structures, designed to improve flexible query processing performance, focusing in the case of possibilities queries. Results show the best indexing strategies for different data a query scenarios, offering effective ways to implement fuzzy data indexes on top of a classical RDBMS. |
Gómez-Vela, F.; Barranco, C. D.; Díaz-Díaz, N. Incorporating biological knowledge for construction of fuzzy networks of gene associations Journal Article In: Applied Soft Computing, vol. 42, pp. 144-155, 2016, ISSN: 1568-4946. Abstract | Links | BibTeX | Tags: Gene Network @article{Gómez-Vela2016, Gene association networks have become one of the most important approaches to modelling of biological processes by means of gene expression data. According to the literature, co-expression-based methods are the main approaches to identification of gene association networks because such methods can identify gene expression patterns in a dataset and can determine relations among genes. These methods usually have two fundamental drawbacks. Firstly, they are dependent on quality of the input dataset for construction of reliable models because of the sensitivity to data noise. Secondly, these methods require that the user select a threshold to determine whether a relation is biologically relevant. Due to these shortcomings, such methods may ignore some relevant information. We present a novel fuzzy approach named FyNE (Fuzzy NEtworks) for modelling of gene association networks. FyNE has two fundamental features. Firstly, it can deal with data noise using a fuzzy-set-based protocol. Secondly, the proposed approach can incorporate prior biological knowledge into the modelling phase, through a fuzzy aggregation function. These features help to gain some insights into doubtful gene relations. The performance of FyNE was tested in four different experiments. Firstly, the improvement offered by FyNE over the results of a co-expression-based method in terms of identification of gene networks was demonstrated on different datasets from different organisms. Secondly, the results produced by FyNE showed its low sensitivity to noise data in a randomness experiment. Additionally, FyNE could infer gene networks with a biological structure in a topological analysis. Finally, the validity of our proposed method was confirmed by comparing its performance with that of some representative methods for identification of gene networks |
Tré, G. De; Billiet, C.; Bronselaer, A.; Barranco, C. D. Indexing possibilistic temporal data in a database of medieval charters Conference 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2016. Abstract | Links | BibTeX | Tags: Fuzzy systems @conference{Tré2016, Querying large databases containing imperfect data requires efficient indexing techniques. Without such techniques query processing would simply take too much time. Considering a possibility based database modelling approach, imperfect data are modelled using a possibility distribution. The possibility distributions used for data modelling in different database records have to be indexed in order to support a faster processing of query conditions that act on the imperfect data. In this paper we study the indexing of imperfect temporal data in the Diplomata Belgica database, which has been co-developed by our research group. Diplomata Belgica is a relational database describing medieval charters written and issued in the southern Low Countries. More specifically, we study how imperfect data on the issuing date of a charter can be modelled and indexed in order to support searches for charters with an issuing date that is compatible with `fuzzy' query preferences provided by the user. A novel, so-called Interval B+-Tree (IBPT) indexing technique is proposed and some illustrative examples of (the handling of) complex, realistic queries are given. |
Najgebauer, P.; Korytkowski, M.; Barranco, C. D.; Scherer, R. Novel Image Descriptor Based on Color Spatial Distribution Conference Artificial Intelligence and Soft Computing, 2016. Abstract | Links | BibTeX | Tags: Content-based image retrieval @conference{Najgebauer2016, This paper proposes a new image descriptor based on color spatial distribution for image similarity comparison. It is similar to methods based on HOG and spatial pyramid but in contrast to them operates on colors and color directions instead of oriented gradients. The presented method assumes using two types of descriptors. The first one is used to describe segments of similar color and the second sub-descriptor describes connections between different adjacent segments. By this means we gain the ability to describe image parts in a more complex way as is in the case of the histogram of oriented gradients (HOG) algorithm but more general as is in the case of keypoint-based methods such as SURF or SIFT. Moreover, in comparison to the keypoint-based methods, the proposed descriptor is less memory demanding and needs only a single step of image data processing. Descriptor comparing is more complicated but allows for descriptor ordering and for avoiding some unnecessary comparison operations. |
Díaz-Montaña, J. J.; Rackham, O. J. L.; Díaz-Díaz, N.; Petretto, E. Web-based Gene Pathogenicity Analysis (WGPA): a web platform to interpret gene pathogenicity from personal genome data Journal Article In: Bioinformatics, vol. 32, no. 4, pp. 635-637, 2016, ISBN: 1367-4803. Abstract | Links | BibTeX | Tags: Gene analysis @article{Díaz-Montaña2016, As the volume of patient-specific genome sequences increases the focus of biomedical research is switching from the detection of disease-mutations to their interpretation. To this end a number of techniques have been developed that use mutation data collected within a population to predict whether individual genes are likely to be disease-causing or not. As both sequence data and associated analysis tools proliferate, it becomes increasingly difficult for the community to make sense of these data and their implications. Moreover, no single analysis tool is likely to capture all relevant genomic features that contribute to the gene’s pathogenicity. Here, we introduce Web-based Gene Pathogenicity Analysis (WGPA), a web-based tool to analyze genes impacted by mutations and rank them through the integration of existing prioritization tools, which assess different aspects of gene pathogenicity using population-level sequence data. Additionally, to explore the polygenic contribution of mutations to disease, WGPA implements gene set enrichment analysis to prioritize disease-causing genes and gene interaction networks, therefore providing a comprehensive annotation of personal genomes data in disease. |
García-Torres, M.; Gómez-Vela, F.; Melián-Batista, B.; Moreno-Vega, J. M. High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach Journal Article In: Information Sciences, vol. 326, pp. 102-118, 2016, ISSN: 0020-0255. Abstract | Links | BibTeX | Tags: Feature selection, Metaheuristic @article{García-Torres2016, In recent years, advances in technology have led to increasingly high-dimensional datasets. This increase of dimensionality along with the presence of irrelevant and redundant features make the feature selection process challenging with respect to efficiency and effectiveness. In this context, approximate algorithms are typically applied since they provide good solutions in a reasonable time. On the other hand, feature grouping has arisen as a powerful approach to reduce dimensionality in high-dimensional data. Recently, some authors have focused their attention on developing methods that combine feature grouping and feature selection to improve the model. In this paper, we propose a feature selection strategy that utilizes feature grouping to increase the effectiveness of the search. As feature selection strategy, we propose a Variable Neighborhood Search (VNS) metaheuristic. Then, we propose to group the input space into subsets of features by using the concept of Markov blankets. To the best of our knowledge, this is the first time in which the Markov blanket is used for grouping features. We test the performance of VNS by conducting experiments on several high-dimensional datasets from two different domains: microarray and text mining. We compare VNS with popular and competitive techniques. Results show that VNS is a competitive strategy capable of finding a small size of features with similar predictive power than that obtained with other algorithms used in this study. |
2015 |
Gómez-Vela, F.; Lagares, J. A.; Díaz-Díaz, N. Gene network coherence based on prior knowledge using direct and indirect relationships Journal Article In: Computational Biology and Chemistry, vol. 56, pp. 142-151, 2015, ISSN: 1476-9271. Abstract | Links | BibTeX | Tags: Biological knowledge, Gene Network @article{Gómez-Vela2015, Gene networks (GNs) have become one of the most important approaches for modeling biological processes. They are very useful to understand the different complex biological processes that may occur in living organisms. Currently, one of the biggest challenge in any study related with GN is to assure the quality of these GNs. In this sense, recent works use artificial data sets or a direct comparison with prior biological knowledge. However, these approaches are not entirely accurate as they only take into account direct gene–gene interactions for validation, leaving aside the weak (indirect) relationships. We propose a new measure, named gene network coherence (GNC), to rate the coherence of an input network according to different biological databases. In this sense, the measure considers not only the direct gene–gene relationships but also the indirect ones to perform a complete and fairer evaluation of the input network. Hence, our approach is able to use the whole information stored in the networks. A GNC JAVA-based implementation is available at: http://fgomezvela.github.io/GNC/. The results achieved in this work show that GNC outperforms the classical approaches for assessing GNs by means of three different experiments using different biological databases and input networks. According to the results, we can conclude that the proposed measure, which considers the inherent information stored in the direct and indirect gene–gene relationships, offers a new robust solution to the problem of GNs biological validation. |
2014 |
Medina, J. M.; Pons, J. E.; Barranco, C. D.; Pons, O. A Fuzzy Temporal Object-Relational Database: Model and Implementation Journal Article In: International Journal of Intelligent Systems, vol. 29, no. 9, pp. 836-863, 2014. Abstract | Links | BibTeX | Tags: Fuzzy databases @article{Medina2014, Abstract In real world, some data have a specific temporal validity that must be appropiately managed. To deal with this kind of data, several proposals of temporal databases have been introduced. Moreover, time can also be affected by imprecision, vagueness, and/or uncertainty, since human beings manage time using temporal indications and temporal notions, which may also be imprecise. For this reason, information systems require appropriate support to accomplish this task. In this work, we present a novel possibilistic valid time model for fuzzy databases including the data structures, the integrity constraints, and the DML. Together with this model, we also present its implementation by means of a fuzzy valid time support module on top of a fuzzy object-relational database system. The integration of these modules allows to perform queries that combines fuzzy valid time constraints together with fuzzy predicates. Besides, the model and implementation proposed support the crisp valid time model as a particular case of the fuzzy valid time support provided. |
2013 |
Rodríguez-Baena, D. Extracting and validating biclusters from binary datasets Journal Article In: AI Communications, vol. 26, no. 4, pp. 417-418, 2013. Abstract | Links | BibTeX | Tags: Biclustering, Binary dataset, Biological validation @article{Rodríguez-Baena2013, This work proposes a novel algorithm to extract biclusters from binary datasets: the Bit-Pattern Biclustering Algorithm (BiBit). The selective search performed by BiBit, based on a very fast bits words processing technique, provides very satisfactory results in quality and computational cost. Besides, a new software tool, named CarGene (Characterization of Genes), that helps scientists to validate sets of genes using biological knowledge is introduced too. |
Barranco, C. D.; Medina, J. M.; Pons, J. E.; Pons, O. Building a Fuzzy Valid Time Support Module on a Fuzzy Object-Relational Database Journal Article In: Flexible Query Answering Systems, pp. 447-458, 2013, ISBN: 978-3-642-40769-7. Abstract | Links | BibTeX | Tags: Fuzzy databases @article{Barranco2013, In this work we present the implementation of a Fuzzy Valid Time Support Module on top of a Fuzzy Object-Relational Database System, based on a model to deal with imprecision in valid-time databases. The integration of these modules allows to perform queries that combines fuzzy valid time constraints with fuzzy predicates. Both modules can be deployed in Oracle Relational Database Management System 10.2 and higher. The module implements the mechanisms that overload the SQL sentences: Insert, Update, Delete and Select to allow fuzzy temporal handling. The implementation described supports the crisp valid time model as a particular case of its fuzzy valid time support provided. |
Díaz-Díaz, N. Genes functional coherence based on actual biological knowledge Journal Article In: AI Communications, vol. 26, no. 2, pp. 247-249, 2013. Abstract | Links | BibTeX | Tags: Biological validation, Gene enrichment analysis @article{Díaz-Díaz2013, This work proposes two new approaches to establish the quality of genetic model based on current biological knowledge. First, it is developed a KEGG-based tool that provides a friendly graphical environment to analyze gene-enrichment. Moreover, a novel GO-based dissimilarity measure is proposed for evaluating groups of genes based on the most relevant functions of the whole set. To found this function, an heuristic approach based on Voronoi diagram has been presented. |
2012 |
Medina, J. M.; Barranco, C. D.; Campaña, J. R.; Castillo, S. J. On Modeling the Behavior of Comparators for Complex Fuzzy Objects in a Fuzzy Object-Relational Database Management System Journal Article In: International Journal of Computational Intelligence Systems, vol. 5, no. 4, pp. 762-774, 2012, ISSN: 1875-6883. Abstract | Links | BibTeX | Tags: Fuzzy databases @article{Medina2012, This paper proposes a parameterized definition for fuzzy comparators on complex fuzzy datatypes like fuzzy collections with conjunctive semantics and fuzzy objects. This definition and its implementation on a Fuzzy Object-Relational Database Management System (FORDBMS) provides the designer with a powerful tool to adapt the behavior of these operators to the semantics of the considered application. |
Medina, J. M.; Castillo, S. J.; Barranco, C. D.; Campaña, J. R. On the Use of a Fuzzy Object-Relational Database for Flexible Retrieval of Medical Images Journal Article In: IEEE Transactions on Fuzzy Systems, vol. 20, no. 4, pp. 786-803, 2012, ISSN: 1941-0034. Abstract | Links | BibTeX | Tags: Fuzzy databases @article{Medina2012b, This paper introduces a novel approach to medical image retrieval using a fuzzy object-relational database management system (FORDBMS). The system stores medical images along with information about the content of the image, such as the presence or absence of certain indicators of pathologies. It allows us to flexibly retrieve them on the basis of these indicators, making it possible to obtain images from patients with similar diagnosis and thus, following a common visual pattern. To illustrate the capabilities of the FORDBMS, this paper focuses on X-ray images of patients suffering from scoliosis (a medical condition in which the patient's spine is curved) from which spine descriptions are obtained. Then queries are performed to obtain a set of images with a certain curvature pattern. Results show high accuracy when evaluated by medical experts. Compared with other ad hoc content-based image retrieval systems, the one presented here is easily adaptable to other application domains, customizable, and very scalable. |
Barranco, C. D.; Helmer, S. An impact ordering approach for indexing fuzzy sets Journal Article In: Fuzzy Sets and Systems, vol. 196, pp. 33-46, 2012, ISSN: 0165-0114. Abstract | Links | BibTeX | Tags: Fuzzy databases @article{Barranco2012, We propose an approach for indexing fuzzy data based on inverted files that speeds up retrieval considerably by stopping the traversal of postings lists early. This is possible because the entries in the postings lists are organized in a way that guarantees that there are no matching items beyond a certain point in a list. Consequently, we can reduce the number of false positives significantly, leading to an increase in retrieval performance. We have implemented our approach and evaluated it experimentally, including a test on skewed and real-world data, comparing it to an approach that has previously been shown to be superior to other methods. |
2011 |
Díaz-Díaz, N.; Gómez-Vela, F.; Aguilar-Ruiz, J.; García-Gutiérrez, J. Gene-gene interaction based clustering method for microarray data Conference 2011 11th International Conference on Intelligent Systems Design and Applications, 2011, ISSN: 2164-7151. Abstract | Links | BibTeX | Tags: Clustering @conference{Díaz-Díaz2011b, In this paper, we propose a greedy clustering algorithm to identify groups of related genes and a new measure to improve the results of this algorithm. Clustering algorithms analyze genes in order to group those with similar behavior. Instead, our approach groups pairs of genes that present similar positive and/or negative interactions. In order to avoid noise in clusters, we apply a threshold, the neighbouring minimun index(?), to know if a pair of genes have interaction enough or not. The algorithm allows the researcher to modify all the criteria: discretization mapping function, gene-gene mapping function and filtering function, and even the neighbouring minimun index, and provides much flexibility to obtain clusters based on the level of precision needed. We have carried out a deep experimental study in databases to obtain a good neighbouring minimun index, ?. The performance of our approach is experimentally tested on the yeast, yeast cell-cycle and malaria datasets. The final number of clusters has a very high level of customization and genes within show a significant level of cohesion, as it is shown graphically in the experiments. |
Gómez-Vela, F.; Martínez-Álvarez, F.; Barranco, C. D.; Díaz-Díaz, N.; Rodríguez-Baena, D.; Aguilar-Ruiz, J. Pattern Recognition in Biological Time Series Journal Article In: Advances in Artificial Intelligence, pp. 164-172, 2011, ISBN: 978-3-642-25274-7. Abstract | Links | BibTeX | Tags: Biclustering, Clustering, Gene Network @article{Gómez-Vela2011b, Knowledge extraction from gene expression data has been one of the main challenges in the bioinformatics field during the last few years. In this context, a particular kind of data, data retrieved in a temporal basis (also known as time series), provide information about the way a gene can be expressed during time. This work presents an exhaustive analysis of last proposals in this area, particularly focusing on those proposals using non--supervised machine learning techniques (i.e. clustering, biclustering and regulatory networks) to find relevant patterns in gene expression. |
Gómez-Vela, F.; Díaz-Díaz, N.; Aguilar-Ruiz, J. Gene Networks Validation based on Metabolic Pathways Conference 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering, 2011. Abstract | Links | BibTeX | Tags: Gene Network @conference{Gómez-Vela2011, In the last few years, DNA microarray technology has attained a very important role in biological and biomedical research. It enables analyzing the relations among thousands of genes simultaneously, generating huge amounts of data. The gene networks represent, in a graph data structure, genes or gene products and the functional relationships between them. These models have been fully used in Bioinformatics because they provide an easy way to understand gene expression regulation. Nowadays, a lot of gene network algorithms have been developed as knowledge extraction techniques. A very important task in all these studies is to assure the network models reliability in order to prove that the methods used are precise. This validation process can be carried out by using the inherent information of the input data or by using public biological knowledge. In this last case, these sources of information provide a great opportunity of verifying the biological soundness of the generated networks. In this work, authors present a gene network validation methodology based on the information stored in Kegg database. With this aim, a complete Kegg pathway conversion to gene network is presented, and a global and functional validation process is proposed, where the whole metabolical information stored in Kegg is used at the same time. |
Aguilar-Ruiz, J.; Rodríguez-Baena, D.; Díaz-Díaz, N.; Nepomuceno-Chamorro, I. A. CarGene: Characterisation of sets of genes based on metabolic pathways analysis Journal Article In: International Journal of Data Mining and Bioinformatics, vol. 5, no. 5, pp. 558-573, 2011. Abstract | Links | BibTeX | Tags: Pathways analysis @article{Aguilar-Ruiz2011, The great amount of biological information provides scientists with an incomparable framework for testing the results of new algorithms. Several tools have been developed for analysing gene-enrichment and most of them are Gene Ontology-based tools. We developed a Kyoto Encyclopedia of Genes and Genomes (Kegg)-based tool that provides a friendly graphical environment for analysing gene-enrichment. The tool integrates two statistical corrections and simultaneously analysing the information about many groups of genes in both visual and textual manner. We tested the usefulness of our approach on a previous analysis (Huttenshower et al.). Furthermore, our tool is freely available (http://www.upo.es/eps/bigs/cargene.html). |
Díaz-Díaz, N.; Aguilar-Ruiz, J. GO-based Functional Dissimilarity of Gene Sets Journal Article In: BMC Bioinformatics, vol. 12, no. 360, 2011. Abstract | Links | BibTeX | Tags: Biological validation @article{Díaz-Díaz2011c, Background The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. Results To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity), a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Conclusions Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG). It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD. |
Rodríguez-Baena, D.; Pérez-Pulido, A. J.; Aguilar-Ruiz, J. A biclustering algorithm for extracting bit-patterns from binary datasets Journal Article In: Bioinformatics, vol. 27, no. 19, pp. 2738-2745, 2011, ISSN: 1367-4803. Abstract | Links | BibTeX | Tags: Biclustering, Binary dataset @article{Rodríguez-Baena2011, Motivation: Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations. Results: A novel approach to extracting biclusters from binary datasets, BiBit, is introduced here. The results obtained from different experiments with synthetic data reveal the excellent performance and the robustness of BiBit to density and size of input data. Also, BiBit is applied to a central nervous system embryonic tumor gene expression dataset to test the quality of the results. A novel gene expression preprocessing methodology, based on expression level layers, and the selective search performed by BiBit, based on a very fast bit-pattern processing technique, provide very satisfactory results in quality and computational cost. The power of biclustering in finding genes involved simultaneously in different cancer processes is also shown. Finally, a comparison with Bimax, one of the most cited binary biclustering algorithms, shows that BiBit is faster while providing essentially the same results. |
Castillo, S. J.; Medina, J. M.; Barranco, C. D.; Garrido, A. Flexible Query Answering Systems, 2011, ISBN: 978-3-642-24764-4. Abstract | Links | BibTeX | Tags: Fuzzy databases @conference{Castillo2011, In medical practice radiologists use X-rays to diagnose and treat scoliosis, which is a medical condition that affects the spine. Doctors usually compare patients’ X-rays to other images with known diagnosis so that they can propose a similar treatment. Since digital medical images are usually stored in large databases, an automatic way to retrieve them could truly help radiologists. In this paper we show how a Fuzzy Object-Relational Database System can be used to provide flexible querying mechanisms to retrieve the similar images. We present the main system capabilities to represent and store curvature pattern descriptions and how queries on them are solved. |
Barranco, C. D.; Campaña, J. R.; Medina, J. M. Flexible Query Answering Systems, 2011, ISBN: 978-3-642-24764-4. Abstract | Links | BibTeX | Tags: Fuzzy databases @conference{Barranco2011, This paper studies the influence of data distribution and clustering on the performance of currently available indexing methods, namely GT and HBPT, to solve necessity measured flexible queries on numerical imprecise data. The study of the above data scenarios lets to obtain valuable information about the expected performance of these indexes on real-world data and query sets, which are usually affected by different skew factors. Results reveal some sensibility of GT and no influence for the considered data scenarios on HBPT. |
Díaz-Díaz, N.; Gómez-Vela, F.; Rodríguez-Baena, D.; Aguilar-Ruiz, J. Gene Regulatory Networks Validation Framework Based in KEGG Conference Hybrid Artificial Intelligent Systems, 2011, ISBN: 978-3-642-21222-2. Abstract | Links | BibTeX | Tags: Biological knowledge, Gene Network @conference{Díaz-Díaz2011, In the last few years, DNA microarray technology has attained a very important role in biological and biomedical research. It enables analyzing the relations among thousands of genes simultaneously, generating huge amounts of data. The gene regulatory networks represent, in a graph data structure, genes or gene products and the functional relationships between them. These models have been fully used in Bioinformatics because they provide an easy way to understand gene expression regulation. |