Segmentation of Multivariate mixed data via lossy data coding and compression

被引:11
|
作者
Ma, Yi
Derksen, Harm
Hong, Wei
Wright, John
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] Univ Michigan, Dept Math, Ann Arbor, MI 48109 USA
[3] Texas Instruments Inc, DSP Solut Res & Dev Ctr, Dallas, TX 75266 USA
基金
美国国家科学基金会;
关键词
multivariate mixed data; data segmentation; data clustering; rate distortion; lossy coding; lossy compression; image segmentation; microarray data clustering;
D O I
10.1109/TPAMI.2007.1085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and rate-distortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm that depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phase-transition-like behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
引用
收藏
页码:1546 / 1562
页数:17
相关论文
共 50 条
  • [21] Lossy Scientific Data Compression With SPERR
    Li, Shaomeng
    Lindstrom, Peter
    Clyne, John
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 1007 - 1017
  • [22] Lossy compression of acoustic backscatter data
    Goldschneider, JR
    Bruce, AG
    Percival, DB
    DETECTION AND REMEDIATION TECHNOLOGIES FOR MINES AND MINELIKE TARGETS II, 1997, 3079 : 213 - 224
  • [23] WAVELET LOSSY COMPRESSION OF RANDOM DATA
    Andrecut, M.
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2009, 20 (01): : 109 - 116
  • [24] Fractal Image Compression Method for Lossy Data Compression
    Artuger, Firat
    Ozkaynak, Fatih
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [25] Information Theoretic Modeling of High Precision Disparity Data for Lossy Compression and Object Segmentation
    Tabus, Ioan
    Kaya, Emre Can
    ENTROPY, 2019, 21 (11)
  • [26] Lossy data compression for next-generation imager data
    Miller, SW
    Puschell, JJ
    ATMOSPHERIC AND ENVIRONMENTAL REMOTE SENSING DATA PROCESSING AND UTILIZATION: AN END TO END SYSTEM PERSPECTIVE, 2004, 5548 : 120 - 127
  • [27] Parallel Implementation of Lossy Data Compression for Temporal Data Sets
    Yuan, Zheng
    Hendrix, William
    Son, Seung Woo
    Federrath, Christoph
    Agrawal, Ankit
    Liao, Wei-keng
    Choudhary, Alok
    PROCEEDINGS OF 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2016, : 62 - 71
  • [28] PRELIMINARY-STUDY ON INFORMATION LOSSY AND LOSS-LESS CODING DATA-COMPRESSION FOR THE ARCHIVING OF ADEOS DATA
    ARAI, K
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 1990, 28 (04): : 732 - 734
  • [29] Coding of SAR Image Data for Data Compression
    Pestel-Schiller, Ulrike
    10TH EUROPEAN CONFERENCE ON SYNTHETIC APERTURE RADAR (EUSAR 2014), 2014,
  • [30] Lossy predictive coding of SAR raw data
    Magli, E
    Olmo, G
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2003, 41 (05): : 977 - 987