Algorithm for Processing Audio Signals Using Machine Learning

被引:0
|
作者
Sokolskyi, S. O.
Movchanyuk, A., V
机构
关键词
drone; small unmanned aerial vehicle; spectrum; signal processing; signal detection; convolutional neural networks; deep learning;
D O I
10.20535/RADAP.2023.93.39-51
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Small unmanned aerial vehicles (UAVs) rapidly develop and are implemented in various industries to make people's lives easier. However, there are potential risks in their use, such as unauthorized surveillance of critical infrastructure<br />objects and the delivery of explosive devices, which poses a significant threat to public and national security. The acoustic method promises direction for solving this issue by analyzing the sound characteristics and Doppler shift signatures of UAVs, using microphone arrays and machine learning techniques. The aim of this article is to develop an algorithm for effective detection and classification of drone audio signals using a deep learning convolutional neural network (CNN), constructing its architecture, and evaluating its performance. Before submitting the drone audio dataset into the neural network, the quality of the audio recordings is improved through normalization, Wiener<br />filtering, and segmentation. The audio is segmented into frames with a duration of 25 ms and a 50% overlap, applying Hamming windowing for better accuracy in the time domain, as temporal precision is crucial in audio signal processing. The obtained data is divided into three sets in a 60/20/20 ratio: for training, validation, and testing purposes. Next, the data is represented by a simplified set of features, extracting mel-spectrograms from each frame of the processed audio signals to capture their temporal and spectral characteristics. The frequency range of analysis corresponds to the working frequency limits of the microphone model (20 Hz - 20 kHz), with a frequency resolution of 50 Hz and 30 working mel frequency bands. Using the training data and the extracted audio features, a neural network architecture is developed to investigate the<br />performance of the drone detection and classification algorithm. It consists of 10 pairs of convolutional layers, ReLU activation, batch normalization, and max-pooling layers. The number of these layers is determined by the size of t the pooling window along the time dimension. This follows by flattening, dropout, fully connected, and Softmax layers. A classification layer is applied to normalize the output data and obtain final probabilities. The Adam optimizer is chosen for model training. Based on the dataset set, the initial learning rate is set to 0.001, gradually decreasing by a factor of 10 after 75% of the epochs to enhance convergence. The accuracy of the input data recognition reaches 99%, and the F1 score of the trained model is 0.93, indicating a high level of overall architecture performance. The maximum distance of effective detection of drones by the algorithm is 200 m.
引用
收藏
页码:39 / 51
页数:13
相关论文
共 50 条
  • [31] Classification of Microseismic Signals Using Machine Learning
    Chen, Ziyang
    Cui, Yi
    Pu, Yuanyuan
    Rui, Yichao
    Chen, Jie
    Mengli, Deren
    Yu, Bin
    PROCESSES, 2024, 12 (06)
  • [32] Classification of Cardiotocography Signals Using Machine Learning
    Sontakke, Sumedh Anand
    Lohokare, Jay
    Dani, Reshul
    Shivagaje, Pranav
    INTELLIGENT SYSTEMS AND APPLICATIONS, INTELLISYS, VOL 2, 2019, 869 : 439 - 450
  • [33] Embedding data in audio signals using HSA-EMD Algorithm
    Selvi, R. Senthamizh
    Kishore, R.
    Suresh, G. R.
    Suba, S. Kanaga
    2017 THIRD INTERNATIONAL CONFERENCE ON SCIENCE TECHNOLOGY ENGINEERING & MANAGEMENT (ICONSTEM), 2017, : 384 - 388
  • [34] A summation algorithm for MPEG-1 coded audio signals: a first step towards audio processing in the compressed domain
    Touimi, AB
    Mahieux, Y
    Lanciani, CA
    ANNALES DES TELECOMMUNICATIONS-ANNALS OF TELECOMMUNICATIONS, 2000, 55 (3-4): : 108 - 116
  • [35] Choosing best algorithm combinations for speech processing tasks in machine learning using MARF
    Mokhov, Serguei A.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2008, 5032 : 216 - 221
  • [36] Evaluation of Maturation in Preterm Infants Through an Ensemble Machine Learning Algorithm Using Physiological Signals
    Leon, Cristhyne
    Cabon, Sandie
    Patural, Hugues
    Gascoin, Geraldine
    Flamant, Cyril
    Roue, Jean-Michel
    Favrais, Geraldine
    Beuchee, Alain
    Pladys, Patrick
    Carrault, Guy
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (01) : 400 - 410
  • [37] COVID-19 detection from optimized features of breathing audio signals using explainable ensemble machine learning
    Sultana, Shafrin
    Hossain, A. B. M. Aowlad
    Alam, Jahangir
    RESULTS IN CONTROL AND OPTIMIZATION, 2025, 18
  • [38] MICROPROCESSOR MIXING AND PROCESSING OF DIGITAL AUDIO SIGNALS
    MCNALLY, GW
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1979, 27 (10): : 793 - 803
  • [39] Musical Gesture Recognition Using Machine Learning and Audio Descriptors
    Best, Paul
    Bresson, Jean
    Schwarz, Diemo
    2018 16TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2018,
  • [40] Capacity Estimation from Environmental Audio Signals Using Deep Learning
    Reyes-Daneri, C.
    Martinez-Murcia, F. J.
    Ortiz, A.
    ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 114 - 124