Segmentation of Multivariate mixed data via lossy data coding and compression

被引:11
|
作者
Ma, Yi
Derksen, Harm
Hong, Wei
Wright, John
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] Univ Michigan, Dept Math, Ann Arbor, MI 48109 USA
[3] Texas Instruments Inc, DSP Solut Res & Dev Ctr, Dallas, TX 75266 USA
基金
美国国家科学基金会;
关键词
multivariate mixed data; data segmentation; data clustering; rate distortion; lossy coding; lossy compression; image segmentation; microarray data clustering;
D O I
10.1109/TPAMI.2007.1085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and rate-distortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm that depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phase-transition-like behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
引用
收藏
页码:1546 / 1562
页数:17
相关论文
共 50 条
  • [41] On reduction of input data for lossy compression of images
    Hayat, A
    Choi, TS
    OPTICAL ENGINEERING, 2004, 43 (02) : 371 - 375
  • [42] Accelerated Additive Algorithm with Lossy Data Compression
    Katkow, Aleksandr
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT III, 2016, 431 : 63 - 74
  • [43] A Quantization Method for Haptic Data Lossy Compression
    Nakano, Tomohiro
    Uozumi, Seiji
    Johansson, Rolf
    Ohnishi, Kouhei
    2015 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS (ICM), 2015, : 126 - 131
  • [44] A generalization of "Image lossy data compression" recommendation
    Serra-Saorista, J
    Garcia, F
    Auli, F
    Gonzalez, JE
    IGARSS 2004: IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM PROCEEDINGS, VOLS 1-7: SCIENCE FOR SOCIETY: EXPLORING AND MANAGING A CHANGING PLANET, 2004, : 301 - 304
  • [45] Entropy, coding and data compression
    S. Natarajan
    Resonance, 2001, 6 (9) : 35 - 45
  • [46] Lossy Compression of Noisy Data for Private and Data-Efficient Learning
    Isik B.
    Weissman T.
    IEEE Journal on Selected Areas in Information Theory, 2022, 3 (04): : 815 - 823
  • [47] Data hiding in images via multiple-based number conversion and lossy compression
    Wu, DC
    Tsai, WH
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1998, 44 (04) : 1406 - 1412
  • [48] Multivariate Watershed Segmentation of Compositional Data
    Hanselmann, Michael
    Koethe, Ullrich
    Renard, Bernhard Y.
    Kirchner, Marc
    Heeren, Ron M. A.
    Hamprecht, Fred A.
    DISCRETE GEOMETRY FOR COMPUTER IMAGERY, PROCEEDINGS, 2009, 5810 : 180 - +
  • [49] Hybrid Compression Technique with Data Segmentation for Electroencephalography Data
    Alsenwi, Madyan
    Saeed, Mohamed
    Ismail, Tawfik
    Mostafa, Hassan
    Gabran, Salam
    2017 29TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS (ICM), 2017, : 234 - 237
  • [50] Data hiding in the context of lossy compression: a combined approach
    Moureaux, JM
    Guillemot, L
    JOURNAL OF ELECTRONIC IMAGING, 2005, 14 (03) : 1 - 12