Learning manifolds from non-stationary streams

被引：0

作者：

Mahapatra, Suchismit ^{[1
]}

Chandola, Varun ^{[1
]}

机构：

[1] SUNY Buffalo, Dept Comp Sci, Buffalo, NY 14261 USA

来源：

JOURNAL OF BIG DATA | 2024年 / 11卷 / 01期

关键词：

Manifold learning; Dimension reduction; Streaming data; Isomap; Gaussian process; Primary; NONLINEAR DIMENSIONALITY REDUCTION; EIGENMAPS;

D O I：

10.1186/s40537-023-00872-8

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Streaming adaptations of manifold learning based dimensionality reduction methods, such as Isomap, are based on the assumption that a small initial batch of observations is enough for exact learning of the manifold, while remaining streaming data instances can be cheaply mapped to this manifold. However, there are no theoretical results to show that this core assumption is valid. Moreover, such methods typically assume that the underlying data distribution is stationary and are not equipped to detect, or handle, sudden changes or gradual drifts in the distribution that may occur when the data is streaming. We present theoretical results to show that the quality of a manifold asymptotically converges as the size of data increases. We then show that a Gaussian Process Regression (GPR) model, that uses a manifold-specific kernel function and is trained on an initial batch of sufficient size, can closely approximate the state-of-art streaming Isomap algorithms, and the predictive variance obtained from the GPR prediction can be employed as an effective detector of changes in the underlying data distribution. Results on several synthetic and real data sets show that the resulting algorithm can effectively learn lower dimensional representation of high dimensional data in a streaming setting, while identifying shifts in the generative distribution. For instance, key findings on a Gas sensor array data set show that our method can detect changes in the underlying data stream, triggered due to real-world factors, such as introduction of a new gas in the system, while efficiently mapping data on a low-dimensional manifold.

引用

页数：24

共 50 条

[41] Tempo Adaptation in Non-stationary Reinforcement Learning
Lee, Hyunin
Ding, Yuhao
Lee, Jongmin
Jin, Ming
Lavaei, Javad
Sojoudi, Somayeh
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] Non-Stationary Bayesian Learning for Global Sustainability
Bhardwaj, Kartikeya
Marculescu, Radu
IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2017, 2 (03): : 304 - 316
[43] Learning Non-Stationary Dynamic Bayesian Networks
Robinson, Joshua W.
Hartemink, Alexander J.
JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 3647 - 3680
[44] Learning non-stationary conditional probability distributions
Husmeier, D
NEURAL NETWORKS, 2000, 13 (03) : 287 - 290
[45] Factored Adaptation for Non-stationary Reinforcement Learning
Feng, Fan
Huang, Biwei
Zhang, Kun
Magliacane, Sara
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[46] NGPCA: Clustering of high-dimensional and non-stationary data streams
Migenda, Nico
Moeller, Ralf
Schenck, Wolfram
SOFTWARE IMPACTS, 2024, 20
[47] Incremental Ensemble Classifier Addressing Non-Stationary Fast Data Streams
Parker, Brandon S.
Khan, Latifur
Bifet, Albert
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 716 - 723
[48] MIXTURE SOURCE IDENTIFICATION IN NON-STATIONARY DATA STREAMS WITH APPLICATIONS IN COMPRESSION
Abdi, Afshin
Fekri, Faramarz
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2502 - 2506
[49] Ensemble of online neural networks for non-stationary and imbalanced data streams
Ghazikhani, Adel
Monsefi, Reza
Yazdi, Hadi Sadoghi
NEUROCOMPUTING, 2013, 122 : 535 - 544
[50] Online Oversampling for Sparsely Labeled Imbalanced and Non-Stationary Data Streams
Korycki, Lukasz
Krawczyk, Bartosz
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →