Segmentation of Multivariate mixed data via lossy data coding and compression

被引:11
|
作者
Ma, Yi
Derksen, Harm
Hong, Wei
Wright, John
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] Univ Michigan, Dept Math, Ann Arbor, MI 48109 USA
[3] Texas Instruments Inc, DSP Solut Res & Dev Ctr, Dallas, TX 75266 USA
基金
美国国家科学基金会;
关键词
multivariate mixed data; data segmentation; data clustering; rate distortion; lossy coding; lossy compression; image segmentation; microarray data clustering;
D O I
10.1109/TPAMI.2007.1085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and rate-distortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm that depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phase-transition-like behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
引用
收藏
页码:1546 / 1562
页数:17
相关论文
共 50 条
  • [31] Distributed Binary Detection With Lossy Data Compression
    Katz, Gil
    Piantanida, Pablo
    Debbah, Merouane
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (08) : 5207 - 5227
  • [32] Optimal algorithm for lossy vector data compression
    Kolesnikov, Alexander
    IMAGE ANALYSIS AND RECOGNITION, PROCEEDINGS, 2007, 4633 : 761 - 771
  • [33] Lossy Data Compression for IoT Sensors: A Review
    Arias Correa, Juan David
    Roschildt Pinto, Alex Sandro
    Montez, Carlos
    INTERNET OF THINGS, 2022, 19
  • [34] Lossy Compression of Weak-Lensing Data
    Vanderveld, R. Ali
    Bernstein, Gary M.
    Stoughton, Chris
    Rhodes, Jason
    Massey, Richard
    Johnston, David
    Dobke, Benjamin M.
    PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 2011, 123 (906) : 996 - 1003
  • [35] Lossy compression of haptic data by using DCT
    Tanaka H.
    Ohnishi K.
    IEEJ Transactions on Industry Applications, 2010, 130 (08) : 945 - 952+2
  • [36] Lossy Compression for Wireless Seismic Data Acquisition
    Rubin, Marc J.
    Wakin, Michael B.
    Camp, Tracy
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2016, 9 (01) : 236 - 252
  • [37] Lossy Compression of Quality Values in Sequencing Data
    Morales, Veronica Suaste
    Houghten, Sheridan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (05) : 1958 - 1969
  • [38] A survey on lossy compression of DSC raw data
    Fischer, Gregor
    Kunz, Dietmar
    Koehler, Katja
    DIGITAL PHOTOGRAPHY IV, 2008, 6817
  • [39] Spectral Distortion in Lossy Compression of Hyperspectral Data
    Aiazzi, Bruno
    Alparone, Luciano
    Baronti, Stefano
    Lastri, Cinzia
    Selva, Andmassimo
    JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2012, 2012
  • [40] Lossy compression of quality scores in genomic data
    Canovas, Rodrigo
    Moffat, Alistair
    Turpin, Andrew
    BIOINFORMATICS, 2014, 30 (15) : 2130 - 2136