Z-Glyph: Visualizing outliers in multivariate data

被引:32
|
作者
Cao, Nan [1 ]
Lin, Yu-Ru [2 ]
Gotz, David [3 ]
Du, Fan [4 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] Univ Pittsburgh, Pittsburgh, PA USA
[3] Univ N Carolina, Chapel Hill, NC USA
[4] Univ Maryland, College Pk, MD 20742 USA
基金
美国国家科学基金会;
关键词
Outlier detection; anomaly detection; information visualization; multidimensional data visualization; INTERACTIVE VISUALIZATION; INTRUSION; TAXONOMY; NUMBER;
D O I
10.1177/1473871616686635
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Outlier analysis techniques are extensively used in many domains such as intrusion detection. Today, even with the most advanced statistical learning techniques, human judgment still plays an important role in outlier analysis tasks due to the difficulty of defining and collecting outlier examples. This work seeks to tackle this problem by introducing a new visualization design, Z-Glyph, a family of glyphs designed to facilitate human judgment in outlier analysis of multivariate data. By employing a location-scale transformation, a Z-Glyph represents the normal data using regular shapes (e.g. straight line and circle), such that the abnormal data can be revealed when deviating from the regular shapes. Extensive controlled experiment and case studies based on real-world datasets indicate the superior performance of the Z-Glyph family, compared with the baselines, suggesting that the proposed design is able to leverage human perceptional features with statistical characterization. This study contributes to a more fundamental understanding about designing visual representations for revealing outliers in multivariate data, which can be applied as a building block in many domain-specific anomaly detection applications.
引用
收藏
页码:22 / 40
页数:19
相关论文
共 50 条
  • [1] Visualizing Multidimensional Data with Glyph SPLOMs
    Yates, A.
    Webb, A.
    Sharpnack, M.
    Chamberlin, H.
    Huang, K.
    Machiraju, R.
    COMPUTER GRAPHICS FORUM, 2014, 33 (03) : 301 - 310
  • [2] ON THE DETECTION OF MULTIVARIATE DATA OUTLIERS AND REGRESSION OUTLIERS
    LAZRAQ, A
    CLEROUX, R
    DATA ANALYSIS, LEARNING SYMBOLIC AND NUMERIC KNOWLEDGE, 1989, : 133 - 140
  • [3] Correlation of Outliers in Multivariate Data
    Kaszuba, Bartosz
    DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY, 2014, : 265 - 272
  • [4] Identification of outliers in multivariate data
    Rocke, DM
    Woodruff, DL
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (435) : 1047 - 1061
  • [5] PROPAGATION OF OUTLIERS IN MULTIVARIATE DATA
    Alqallaf, Fatemah
    Van Aelst, Stefan
    Yohai, Victor J.
    Zamar, Ruben H.
    ANNALS OF STATISTICS, 2009, 37 (01): : 311 - 331
  • [6] A Glyph-based Multimodal Presentation of Multivariate Data
    Yasmin, Shamima
    25TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY (VRST 2019), 2019,
  • [7] IDENTIFYING MULTIPLE OUTLIERS IN MULTIVARIATE DATA
    HADI, AS
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1992, 54 (03): : 761 - 771
  • [8] Detecting Outliers in Multivariate Laboratory Data
    Southworth, Harry
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2008, 18 (06) : 1178 - 1183
  • [9] Interpretation of multivariate outliers for compositional data
    Filzmoser, Peter
    Hron, Karel
    Reimann, Clemens
    COMPUTERS & GEOSCIENCES, 2012, 39 : 77 - 85
  • [10] Visualizing Big Data Outliers through Distributed Aggregation
    Wilkinson, Leland
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) : 256 - 266