Enhanced Data Mining and Visualization of Sensory-Graph-Modeled Datasets through Summarization

被引:3
|
作者
Hashmi, Syed Jalaluddin [1 ]
Alabdullah, Bayan [2 ]
Al Mudawi, Naif [3 ]
Algarni, Asaad [4 ]
Jalal, Ahmad [5 ]
Liu, Hui [6 ]
机构
[1] Natl Univ Comp & Emerging Sci, Sch Comp, Islamabad 44000, Pakistan
[2] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[3] Najran Univ, Coll Comp Sci & Informat Syst, Dept Comp Sci, Najran 55461, Saudi Arabia
[4] Northern Border Univ, Fac Comp & Informat Technol, Dept Comp Sci, Rafha 91911, Saudi Arabia
[5] Air Univ, Fac Comp & AI, E9, Islamabad 44000, Pakistan
[6] Univ Bremen, Cognit Syst Lab, D-28359 Bremen, Germany
关键词
sensors datasets; Bio-Mouse-Gene; data visualization; big data; data mining; graph summarization; weighted LSH; correction sets; STORAGE;
D O I
10.3390/s24144554
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The acquisition, processing, mining, and visualization of sensory data for knowledge discovery and decision support has recently been a popular area of research and exploration. Its usefulness is paramount because of its relationship to the continuous involvement in the improvement of healthcare and other related disciplines. As a result of this, a huge amount of data have been collected and analyzed. These data are made available for the research community in various shapes and formats; their representation and study in the form of graphs or networks is also an area of research which many scholars are focused on. However, the large size of such graph datasets poses challenges in data mining and visualization. For example, knowledge discovery from the Bio-Mouse-Gene dataset, which has over 43 thousand nodes and 14.5 million edges, is a non-trivial job. In this regard, summarizing the large graphs provided is a useful alternative. Graph summarization aims to provide the efficient analysis of such complex and large-sized data; hence, it is a beneficial approach. During summarization, all the nodes that have similar structural properties are merged together. In doing so, traditional methods often overlook the importance of personalizing the summary, which would be helpful in highlighting certain targeted nodes. Personalized or context-specific scenarios require a more tailored approach for accurately capturing distinct patterns and trends. Hence, the concept of personalized graph summarization aims to acquire a concise depiction of the graph, emphasizing connections that are closer in proximity to a specific set of given target nodes. In this paper, we present a faster algorithm for the personalized graph summarization (PGS) problem, named IPGS; this has been designed to facilitate enhanced and effective data mining and visualization of datasets from various domains, including biosensors. Our objective is to obtain a similar compression ratio as the one provided by the state-of-the-art PGS algorithm, but in a faster manner. To achieve this, we improve the execution time of the current state-of-the-art approach by using weighted, locality-sensitive hashing, through experiments on eight large publicly available datasets. The experiments demonstrate the effectiveness and scalability of IPGS while providing a similar compression ratio to the state-of-the-art approach. In this way, our research contributes to the study and analysis of sensory datasets through the perspective of graph summarization. We have also presented a detailed study on the Bio-Mouse-Gene dataset, which was conducted to investigate the effectiveness of graph summarization in the domain of biosensors.
引用
收藏
页数:24
相关论文
共 36 条
  • [31] Constructing Urban Building Water Environment Governance through Digital Art-Enhanced Big Data Visualization
    Jia C.
    Jia Y.
    Computer-Aided Design and Applications, 2024, 21 (S11): : 176 - 189
  • [32] Examining redox modulation pathways in the post-mortem frontal cortex in patients with bipolar disorder through data mining of microRNA expression datasets
    Kim, Helena Kyunghee
    Tyryshkin, Kathrin
    Elmi, Nika
    Feilotter, Harriet
    Andreazza, Ana Cristina
    JOURNAL OF PSYCHIATRIC RESEARCH, 2018, 99 : 39 - 49
  • [33] Covid-on-the-Web: Exploring the COVID-19 scientific literature through visualization of linked data from entity and argument mining
    Menin, Aline
    Michel, Franck
    Gandon, Fabien
    Gazzotti, Raphael
    Cabrio, Elena
    Corby, Olivier
    Giboin, Alain
    Marro, Santiago
    Mayer, Tobias
    Villata, Serena
    Winckler, Marco
    QUANTITATIVE SCIENCE STUDIES, 2022, 2 (04): : 1301 - 1323
  • [34] From Data to Human-Readable Requirements: Advancing Requirements Elicitation through Language-Transformer-Enhanced Opportunity Mining
    Harth, Pascal
    Jaehde, Orlando
    Schneider, Sophia
    Horn, Nils
    Buchkremer, Ruediger
    ALGORITHMS, 2023, 16 (09)
  • [35] Analysis of Vitamin D Send-out Testing at an Academic Medical Center: Understanding Test Utilization and Result Patterns Through Data Mining and Visualization
    Genzen, Jonathan
    Gosselin, Jennifer T.
    AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2012, 138 : A060 - A060
  • [36] Information mapping, charting and visualization for total quality knowledge management: Constructing and assessing a web-coordinated experts map-chart/depository/query-report system through citation data mining and information landscaping
    Tsai, BS
    8TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS AND INFORMETRICS, VOLS 1 AND 2 - ISSI-2001, PROCEEDINGS, 2001, : 691 - 704