A New Evolutionary Rough Fuzzy Integrated Machine Learning Technique for microRNA selection using Next-Generation Sequencing data of Breast Cancer

被引:2
|
作者
Sarkar, Jnanendra Prasad [1 ,6 ]
Saha, Indrajit [2 ]
Rakshit, Somnath [3 ]
Pal, Monalisa [4 ]
Wlasnowolski, Michal [3 ,5 ]
Sarkar, Anasua [6 ]
Maulik, Ujjwal [6 ]
Plewczynski, Dariusz [3 ,5 ]
机构
[1] Larsen & Toubro Infotech Ltd, Pune, Maharashtra, India
[2] Natl Inst Tech Teachers Training & Res, Dept Comp Sci & Engn, Kolkata, India
[3] Univ Warsaw, Ctr New Technol, Warsaw, Poland
[4] Indian Stat Inst, Machine Intelligence Unit, Kolkata, India
[5] Warsaw Univ Technol, Fac Math & Informat Sci, Warsaw, Poland
[6] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
基金
欧盟地平线“2020”;
关键词
Breast Cancer; Clustering; Fuzzy Set; Feature Selection; Particle Swarm Optimization; Random Forest; Rough Set; GENES; MIRNAS;
D O I
10.1145/3319619.3326836
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
MicroRNAs (miRNA) play an important role in various biological process by regulating gene expression. Their abnormal expression may lead to cancer. Therefore, analysis of such data may discover potential biological insight for cancer diagnosis. In this regard, recently many feature selection methods have been developed to identify such miRNAs. These methods have their own merits and demerits as the task is very challenging in nature. Thus, in this article, we propose a novel wrapper based feature selection technique with the integration of Rough and Fuzzy sets, Random Forest and Particle Swarm Optimization, to identify putative miRNAs that can solve the underlying biological problem effectively, i.e. to separate tumour and control samples. Here, Rough and Fuzzy sets help to address the vagueness and overlapping characteristics of the dataset while performing clustering. On the other hand, Random Forest is applied to perform the classification task on the clustering results to yield better solutions. The integrated clustering and classification tasks are considered as an underlying optimization problem for Particle Swarm Optimization method where particles encode features, in this case, miRNAs. The performance of the proposed wrapper based method has been demonstrated quantitatively and visually on next-generation sequencing data of breast cancer from The Cancer Genome Atlas (TCGA). Finally, the selected miRNAs are validated through biological significance tests. The code and dataset used in this paper are available online(1).
引用
收藏
页码:1846 / 1854
页数:9
相关论文
共 50 条
  • [31] Multiple Mutation Detection for Risk Assessment in Patients with Breast Cancer by Using Next-Generation Sequencing
    Liu, Peng-Fei
    Zhuo, Zhong-Ling
    Xie, Fei
    Xian, Hai-Peng
    Liu, Chang
    Wang, Shu
    Zhao, Xiao-Tao
    ANNALS OF CLINICAL AND LABORATORY SCIENCE, 2021, 51 (05): : 670 - 677
  • [32] Applications of Deep Learning and Fuzzy Systems to Detect Cancer Mortality in Next-Generation Genomic Data
    Yang, Cheng-Hong
    Moi, Sin-Hua
    Hou, Ming-Feng
    Chuang, Li-Yeh
    Lin, Yu-Da
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (12) : 3833 - 3844
  • [33] A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data
    Vavoulis, Dimitrios V.
    Cutts, Anthony
    Taylor, Jenny C.
    Schuh, Anna
    BIOINFORMATICS, 2021, 37 (02) : 147 - 154
  • [34] A novel algorithm for the detection of microsatellite instability in endometrial cancer using next-generation sequencing data
    Zhou, Bing
    Wang, Yu
    Ding, Lu
    Tian, Xiaolei
    Sun, Wu
    Zhang, Wei
    Liu, Yin-Hua
    ONCOLOGY LETTERS, 2025, 29 (02)
  • [35] Estimating breast tissue-specific DNA methylation age using next-generation sequencing data
    James R. Castle
    Nan Lin
    Jinpeng Liu
    Anna Maria V. Storniolo
    Aditi Shendre
    Lifang Hou
    Steve Horvath
    Yunlong Liu
    Chi Wang
    Chunyan He
    Clinical Epigenetics, 2020, 12
  • [36] Estimating breast tissue-specific DNA methylation age using next-generation sequencing data
    Castle, James R.
    Lin, Nan
    Liu, Jinpeng
    Storniolo, Anna Maria V.
    Shendre, Aditi
    Hou, Lifang
    Horvath, Steve
    Liu, Yunlong
    Wang, Chi
    He, Chunyan
    CLINICAL EPIGENETICS, 2020, 12 (01)
  • [37] Estimating breast tissue-specific epigenetic age using next-generation methylation sequencing data
    Castle, James R.
    Lin, Nan
    Liu, Jingpeng
    Wang, Chi
    Liu, Yunlong
    He, Chunyan
    CANCER RESEARCH, 2019, 79 (13)
  • [38] Machine Learning Model to Track SARS-CoV-2 Viral Mutation Evolution and Speciation Using Next-generation Sequencing Data
    Derecichei, Iulian
    Atikukke, Govindaraja
    ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [39] MitoScape: A big-data, machine-learning platform for obtaining mitochondrial DNA from next-generation sequencing data
    Singh, Larry N.
    Ennis, Brian
    Loneragan, Bryn
    Tsao, Noah L.
    Sanchez, M. Isabel G. Lopez
    Li, Jianping
    Acheampong, Patrick
    Tran, Oanh
    Trounce, Ian A.
    Zhu, Yuankun
    Potluri, Prasanth
    Emanuel, Beverly S.
    Rader, Daniel J.
    Arany, Zoltan
    Damrauer, Scott M.
    Resnick, Adam C.
    Anderson, Stewart A.
    Wallace, Douglas C.
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (11)
  • [40] Characterization of RNA in exosomes secreted by human breast cancer cell lines using next-generation sequencing
    Jenjaroenpun, Piroon
    Kremenska, Yuliya
    Nair, Vrundha M.
    Kremenskoy, Maksym
    Joseph, Baby
    Kurochkin, Igor V.
    PEERJ, 2013, 1