Enhancing User Experience on Q&A Platforms: Measuring Text Similarity Based on Hybrid CNN-LSTM Model for Efficient Duplicate Question Detection

被引:4
|
作者
Faseeh, Muhammad [1 ]
Khan, Murad Ali [2 ]
Iqbal, Naeem [3 ]
Qayyum, Faiza [2 ]
Mehmood, Asif [4 ]
Kim, Jungsuk [4 ,5 ]
机构
[1] Jeju Natl Univ, Dept Elect Engn, Jeju Si 63243, South Korea
[2] Jeju Natl Univ, Dept Comp Engn, Jeju Si 63243, South Korea
[3] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast BT7 1NN, North Ireland
[4] Gachon Univ, Coll IT Convergence, Dept Biomed Engn, Seongnam Si 13120, South Korea
[5] Cellico Res & Dev Lab, Sungnam Si 13449, South Korea
关键词
Deep learning; Semantics; Brain modeling; Task analysis; Feature extraction; Convolutional neural networks; Syntactics; Natural language processing; Question answering (information retrieval); Duplicate question identification; stack overflow; deep learning (DL); word embeddings; natural language processing (NLP); question-and-answer (QA) platforms; TWEETS;
D O I
10.1109/ACCESS.2024.3358422
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This research introduces an innovative approach for identifying duplicate questions within the Stack Overflow community, a challenging task in NLP. Leveraging deep learning techniques, our proposed methodology combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to capture both local and long-term dependencies in textual data. We employ word embeddings, specifically Google's Word2Vec and GloVe, to enhance text representation. Extensive experiments on the Stack Overflow dataset demonstrate the effectiveness of our approach, achieving an impressive accuracy of 87.09% and a recall rate of 87.%. The integration of CNN and LSTM models significantly streamlines preprocessing, making it a valuable tool for detecting duplicate questions. Future directions include extending the model to multiple languages and exploring alternative word embedding techniques. Our approach presents promising applications beyond Stack Overflow, offering solutions for identifying similar questions on various QA platforms.
引用
收藏
页码:34512 / 34526
页数:15
相关论文
共 8 条
  • [1] Text classification based on hybrid CNN-LSTM hybrid model
    She, Xiangyang
    Zhang, Di
    2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2018, : 185 - 189
  • [2] Machine Fault Detection Using a Hybrid CNN-LSTM Attention-Based Model
    Borre, Andressa
    Seman, Laio Oriel
    Camponogara, Eduardo
    Stefenon, Stefano Frizzo
    Mariani, Viviana Cocco
    Coelho, Leandro dos Santos
    SENSORS, 2023, 23 (09)
  • [3] A lightweight hybrid CNN-LSTM explainable model for ECG-based arrhythmia detection
    Alamatsaz, Negin
    Tabatabaei, Leyla
    Yazdchi, Mohammadreza
    Payan, Hamidreza
    Alamatsaz, Nima
    Nasimi, Fahimeh
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 90
  • [4] SkipGateNet: A Lightweight CNN-LSTM Hybrid Model With Learnable Skip Connections for Efficient Botnet Attack Detection in IoT
    Alshehri, Mohammed S.
    Ahmad, Jawad
    Almakdi, Sultan
    Qathrady, Mimonah Al
    Ghadi, Yazeed Yasin
    Buchanan, William J.
    IEEE ACCESS, 2024, 12 : 35521 - 35538
  • [5] A deep learning-based novel hybrid CNN-LSTM architecture for efficient detection of threats in the IoT ecosystem
    Nazir, Ahsan
    He, Jingsha
    Zhu, Nafei
    Qureshi, Saima Siraj
    Qureshi, Siraj Uddin
    Ullah, Faheem
    Wajahat, Ahsan
    Pathan, Muhammad Salman
    AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (07)
  • [6] CroLSSim: Cross-language software similarity detector using hybrid approach of LSA-based AST-MDrep features and CNN-LSTM model
    Ullah, Farhan
    Naeem, Muhammad Rashid
    Naeem, Hamad
    Cheng, Xiaochun
    Alazab, Mamoun
    International Journal of Intelligent Systems, 2022, 37 (09): : 5768 - 5795
  • [7] CroLSSim: Cross-language software similarity detector using hybrid approach of LSA-based AST-MDrep features and CNN-LSTM model
    Ullah, Farhan
    Naeem, Muhammad Rashid
    Naeem, Hamad
    Cheng, Xiaochun
    Alazab, Mamoun
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (09) : 5768 - 5795
  • [8] Rapid detection of multi-indicator components of classical famous formula Zhuru Decoction concentration process based on fusion CNN-LSTM hybrid model with the near-infrared spectrum
    He, Tianyu
    Shi, Yabo
    Cui, Enzhong
    Wang, Xiaoli
    Mao, Chunqin
    Xie, Hui
    Lu, Tulin
    MICROCHEMICAL JOURNAL, 2023, 195