A New Evolutionary Rough Fuzzy Integrated Machine Learning Technique for microRNA selection using Next-Generation Sequencing data of Breast Cancer

被引:2
|
作者
Sarkar, Jnanendra Prasad [1 ,6 ]
Saha, Indrajit [2 ]
Rakshit, Somnath [3 ]
Pal, Monalisa [4 ]
Wlasnowolski, Michal [3 ,5 ]
Sarkar, Anasua [6 ]
Maulik, Ujjwal [6 ]
Plewczynski, Dariusz [3 ,5 ]
机构
[1] Larsen & Toubro Infotech Ltd, Pune, Maharashtra, India
[2] Natl Inst Tech Teachers Training & Res, Dept Comp Sci & Engn, Kolkata, India
[3] Univ Warsaw, Ctr New Technol, Warsaw, Poland
[4] Indian Stat Inst, Machine Intelligence Unit, Kolkata, India
[5] Warsaw Univ Technol, Fac Math & Informat Sci, Warsaw, Poland
[6] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
基金
欧盟地平线“2020”;
关键词
Breast Cancer; Clustering; Fuzzy Set; Feature Selection; Particle Swarm Optimization; Random Forest; Rough Set; GENES; MIRNAS;
D O I
10.1145/3319619.3326836
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
MicroRNAs (miRNA) play an important role in various biological process by regulating gene expression. Their abnormal expression may lead to cancer. Therefore, analysis of such data may discover potential biological insight for cancer diagnosis. In this regard, recently many feature selection methods have been developed to identify such miRNAs. These methods have their own merits and demerits as the task is very challenging in nature. Thus, in this article, we propose a novel wrapper based feature selection technique with the integration of Rough and Fuzzy sets, Random Forest and Particle Swarm Optimization, to identify putative miRNAs that can solve the underlying biological problem effectively, i.e. to separate tumour and control samples. Here, Rough and Fuzzy sets help to address the vagueness and overlapping characteristics of the dataset while performing clustering. On the other hand, Random Forest is applied to perform the classification task on the clustering results to yield better solutions. The integrated clustering and classification tasks are considered as an underlying optimization problem for Particle Swarm Optimization method where particles encode features, in this case, miRNAs. The performance of the proposed wrapper based method has been demonstrated quantitatively and visually on next-generation sequencing data of breast cancer from The Cancer Genome Atlas (TCGA). Finally, the selected miRNAs are validated through biological significance tests. The code and dataset used in this paper are available online(1).
引用
收藏
页码:1846 / 1854
页数:9
相关论文
共 50 条
  • [41] Wound age estimation based on next-generation sequencing: Fitting the optimal index system using machine learning
    Ren, Kang
    Wang, Liangliang
    Wang, Yifei
    An, Guoshuai
    Du, Qiuxiang
    Cao, Jie
    Jin, Qianqian
    Yun, Keming
    Guo, Zhongyuan
    Wang, Yingyuan
    Liang, Qiangrong
    Sun, Junhong
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2022, 59
  • [42] Next-Generation Sequencing-Based Cancer Panel Data Conversion Using International Standards to Implement a Clinical Next-Generation Sequencing Research System: Single-Institution Study
    Park, Phillip
    Shin, Soo-Yong
    Park, Seog Yun
    Yun, Jeonghee
    Shin, Chulmin
    Jung, Jipmin
    Choi, Kui Son
    Cha, Hyo Soung
    JMIR MEDICAL INFORMATICS, 2020, 8 (04)
  • [43] Reliable Pan-Cancer Microsatellite Instability Assessment by Using Targeted Next-Generation Sequencing Data
    Middha, Sumit
    Zhang, Liying
    Nafa, Khedoudja
    Jayakumaran, Gowtham
    Wong, Donna
    Kim, Hyunjae R.
    Sadowska, Justyna
    Berger, Michael F.
    Delair, Deborah F.
    Shia, Jinru
    Stadler, Zsofia
    Klimstra, David S.
    Ladanyi, Marc
    Zehir, Ahmet
    Hechtman, Jaclyn F.
    JCO PRECISION ONCOLOGY, 2017, 1 : 1 - 17
  • [44] Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data
    Hajiramezanali, Ehsan
    Dadaneh, Siamak Zamani
    Karbalayghareh, Alireza
    Zhou, Mingyuan
    Qian, Xiaoning
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [45] Decoding the Transcriptional Landscape of Triple-Negative Breast Cancer Using Next-Generation Whole Transcriptome Sequencing
    Radovich, M.
    Clare, S. E.
    Sledge, G. W.
    Pardo, I.
    Mathieson, T.
    Kassem, N.
    Hancock, B. A.
    Storniolo, A. M. V.
    Rufenbarger, C.
    Lillemoe, H. A.
    Sun, J.
    Henry, J. E.
    Goulet, R.
    Hilligoss, E. E.
    Siddiqui, A. S.
    Breu, H.
    Sakarya, O.
    Hyland, F. C.
    Muller, M. W.
    Popescu, L.
    Zhu, J.
    Hickenbotham, M.
    Glasscock, J.
    Ivan, M.
    Liu, Y.
    Schneider, B. P.
    CANCER RESEARCH, 2010, 70
  • [46] Secretory breast cancer in a boy: A case report with genetic analysis using next-generation sequencing and literature review
    Deng, Lili
    Li, Yang
    Zhong, Jincai
    MEDICINE, 2023, 102 (27) : E34192
  • [47] Unbiased machine learning methods to predict the limitations of variant calling in homologous genomic regions using next-generation sequencing
    Li, Feng
    Gnanaolivu, Rohan
    Vidal-Folch, Noemi
    Saha, Neiladri
    Mistry, Nipun
    Blake, Emily
    Niu, Zhiyv
    McClelland, Shawn
    Oglesbee, Devin
    Wang, Chen
    MOLECULAR GENETICS AND METABOLISM, 2021, 132 : S250 - S252
  • [48] Differential microRNA expression profiles determined by next-generation sequencing in three fulvestrant-resistant human breast cancer cell lines
    Guo, Juan
    He, Keli
    Zeng, Hui
    Shi, Yu
    Ye, Peng
    Zhou, Qian
    Pan, Zhongya
    Long, Xinghua
    ONCOLOGY LETTERS, 2019, 17 (04) : 3765 - 3776
  • [49] Design of personalized neoantigen mRNA vaccines against breast cancer patients in Pakistan based on next-generation sequencing data
    Raheem, Kayode Yomi
    Muddassar, Muhammad
    CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2024, 33 (09)
  • [50] InDelGT: An integrated pipeline for extracting indel genotypes for genetic mapping in a hybrid population using next-generation sequencing data
    Pan, Zhiliang
    Zhang, Jinpeng
    Bai, Shengjun
    Li, Zhiting
    Tong, Chunfa
    APPLICATIONS IN PLANT SCIENCES, 2022, 10 (06):