Institutional Repository

One-class SVM and supervised machine learning models for uncovering associations of non-coding RNA with diseases

Show simple item record

dc.contributor.advisor Wang, Zenghui
dc.contributor.author Gutiérrez Cárdenas, Juan Manuel
dc.date.accessioned 2022-02-09T12:27:35Z
dc.date.available 2022-02-09T12:27:35Z
dc.date.issued 2022-01
dc.identifier.uri https://hdl.handle.net/10500/28539
dc.description.abstract The study of MicroRNAs (miRNAs), long non-coding RNAs (lncRNAs) and gene interactions may be expected to provide new technologies to serve as valuable biomarkers for personalized treatments of diseases and to aid in the prognosis of certain conditions. These molecules act at the genome level by regulating or suppressing their protein expression functions. The primary challenge in the study of these non-coding molecules involves the necessity of finding labeled data indicating positive and negative interactions when predicting interactions using machine-learning or deep-learning techniques. However, usually we end up with a scenario of unbalanced data or unstable scenarios for using these models. An additional problem involves the extraction of features derived from the binding of these non-coding RNAs and genes. This binding process usually occurs fully or partially in animal genetics, which leads to considerable complexity in studying the process. Therefore, the main objective of the present work is to demonstrate that it is possible to use features extracted for miRNAs sequences in the development of diseases such as breast cancer, breast neoplasms, or if there is any influence with immune genes related to the SARS-COV-2. We performed experiments focusing on the erb-b2 receptor tyrosine kinase 2 (ERBB2) gene involved in breast cancer. For this purpose, we gathered miRNA-mRNA information from the binding between these two genetic molecules. In this part of our research, we applied a One-Class SVM and an Isolation Forest to discriminate between weak interactions, outliers given by the one-class model, and strong interactions that could occur between miRNA and mRNA (messenger RNA). Additionally, this study aimed to differentiate between breast cancer cases and breast neoplasm conditions. In this section we used the information encoded in lncRNAs. The additional feature used in this part was the frequency of k-mers, i.e., small portions of nucleotides, along with the data from the energy released in miRNA folding. The models used to discriminate between these diseases were One-Class SVM, SVM, and Random Forest. In the final part of the present work, we described a subset of probable miRNA binding with SARS-COV-2 RNA, focusing on those miRNAs with a relationship with genes involved in the immunological system of the human body. The models used as classifiers were One-Class SVM, SVM, and Random Forest. The results obtained in the present study are comparable to those found in the current literature and demonstrate the feasibility of using one-class models combined with features from the coupling of non-coding genes or mRNAs and their relationships with forms of breast cancer and viral infections. This work is expected to establish a basis for future avenues of research to apply one-class machine-learning models with feature extraction based on genomic sequences to the study of the relationship between non-coding RNAs and various diseases. en
dc.format.extent 1 online resource (xiii, 104 leaves) : illustrations (chiefly color), color graphs en
dc.language.iso en en
dc.subject mRNAs en
dc.subject lncRNAs en
dc.subject K-mers en
dc.subject Sequence features en
dc.subject Breast neoplasms en
dc.subject Breast cancer en
dc.subject SARS-CoV-2 en
dc.subject One-class models en
dc.subject Supervised learning en
dc.subject Unsupervised learning en
dc.subject.ddc 615.8950285631
dc.subject.lcsh Non-coding RNA -- Data processing en
dc.subject.lcsh Breast -- Cancer -- Gene therapy -- Data processing en
dc.subject.lcsh COVID-19 (Disease) -- Gene therapy -- Data processing en
dc.subject.lcsh Support vector machines en
dc.subject.lcsh Machine learning en
dc.title One-class SVM and supervised machine learning models for uncovering associations of non-coding RNA with diseases en
dc.type Thesis en
dc.description.department School of Computing en
dc.description.degree Ph. D. (Computing)


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UnisaIR


Browse

My Account

Statistics