Possibilistic Similarity Measures for Data Science and Machine Learning Applications

被引:4
|
作者
Charfi, Amal [1 ]
Bouhamed, Sonda Ammar [1 ,2 ]
Bosse, Eloi [2 ,3 ]
Kallel, Imene Khanfir [1 ,2 ]
Bouchaala, Wassim [4 ]
Solaiman, Basel [2 ]
Derbel, Nabil [1 ]
机构
[1] Univ Sfax, Natl Sch Engineers Sfax, Control & Energy Managment CEM Lab, Sfax 3038, Tunisia
[2] IMT Atlantique, Image & Informat Proc Dept iTi, F-838182923 Brest, France
[3] Expertises Parafuse Inc, Quebec City, PQ G1W 4N1, Canada
[4] Tunisian Profess Training Agcy, Sfax 3000, Tunisia
关键词
Uncertainty; Possibility theory; Measurement uncertainty; Machine learning; Atmospheric measurements; Particle measurements; Indexes; Classification; distance; entropy; learning; measures of specificity; possibility distributions; similarity; uncertainty; INFORMATION; UNCERTAINTY; NOTION;
D O I
10.1109/ACCESS.2020.2979553
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Measuring similarity is of a great interest in many research areas such as in data sciences, machine learning, pattern recognition, text analysis and information retrieval to name a few. Literature has shown that possibility is an attractive notion in the context of distinguishability assessment and can lead to very efficient and computationally inexpensive learning schemes. This paper focuses on determining the similarity between two possibility distributions. A review of existing similarity measures within the possibilistic framework is presented first. Then, similarity measures are analyzed with respect to their capacity to satisfy a set of required properties that a similarity measure should own. Most of the existing possibilistic similarity measures produce undesirable outcomes since they generally depend on the application context. A new similarity measure, called InfoSpecificity, is introduced and the similarity measures are categorized into three main methods: morphic-based, amorphic-based and hybrid. Two experiments are being conducted using four benchmark databases. The aim of the experiments is to compare the efficiency of the possibilistic similarity measures when applied to real data. Empirical experiments have shown good results for the hybrid methods, particularly with the InfoSpecificity measure. In general, the hybrid methods outperform the other two categories when evaluated on small-size samples, i.e., poor-data context (or poor-informed environment) where possibility theory can be used at the greatest benefit.
引用
收藏
页码:49198 / 49211
页数:14
相关论文
共 50 条
  • [31] Machine learning in suicide science: Applications and ethics
    Linthicum, Kathryn P.
    Schafer, Katherine Musacchio
    Ribeiro, Jessica D.
    BEHAVIORAL SCIENCES & THE LAW, 2019, 37 (03) : 214 - 222
  • [32] Machine Learning and Its Application in Software Fault Prediction with Similarity Measures
    Rashid, Ekbal
    Patnaik, Srikanta
    Usmani, Arshad
    COMPUTATIONAL VISION AND ROBOTICS, 2015, 332 : 37 - 45
  • [33] SIMILARITY MEASURES FOR STRUCTURED DATA - A GENERAL FRAMEWORK AND SOME APPLICATIONS TO VEGETATION DATA
    DALE, MB
    VEGETATIO, 1989, 81 (1-2): : 41 - 60
  • [34] Increasing the Density of Laboratory Measures for Machine Learning Applications
    Abedi, Vida
    Li, Jiang
    Shivakumar, Manu K.
    Avula, Venkatesh
    Chaudhary, Durgesh P.
    Shellenberger, Matthew J.
    Khara, Harshit S.
    Zhang, Yanfei
    Lee, Ming Ta Michael
    Wolk, Donna M.
    Yeasin, Mohammed
    Hontecillas, Raquel
    Bassaganya-Riera, Josep
    Zand, Ramin
    JOURNAL OF CLINICAL MEDICINE, 2021, 10 (01) : 1 - 23
  • [35] Submodular Combinatorial Information Measures with Applications in Machine Learning
    Iyer, Rishabh
    Khargonkar, Ninad
    Bilmes, Jeff
    Asnani, Himanshu
    ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
  • [36] Machine learning for data mining, data science and data analytics
    Radhakrishna, Vangipuram
    Reddy, Gali Suresh
    Kumar, Gunupudi Rajesh
    Rao, Dammavalam Srinivasa
    Recent Advances in Computer Science and Communications, 2021, 14 (05): : 1356 - 1357
  • [37] Exploiting the similarity of dissimilarities for biomedical applications and enhanced machine learning
    Kabir, Mohammad Neamul
    Wang, Li Rong
    Goh, Wilson Wen Bin
    PLOS COMPUTATIONAL BIOLOGY, 2025, 21 (01)
  • [38] Nursing Orientation to Data Science and Machine Learning
    O'Brien, Roxanne L.
    O'Brien, Matt W.
    AMERICAN JOURNAL OF NURSING, 2021, 121 (04) : 32 - 39
  • [39] Big data and machine learning for materials science
    Rodrigues J.F., Jr.
    Florea L.
    de Oliveira M.C.F.
    Diamond D.
    Oliveira O.N., Jr.
    Discover Materials, 1 (1):
  • [40] Small data machine learning in materials science
    Xu, Pengcheng
    Ji, Xiaobo
    Li, Minjie
    Lu, Wencong
    NPJ COMPUTATIONAL MATERIALS, 2023, 9 (01)