Photo Semantic Understanding and Retargeting by a Noise-Robust Regularized Topic Model

被引:0
|
作者
Wang, Guifeng [1 ]
Zhang, Luming [1 ]
Li, Yongbin [1 ]
Sheng, Yichuan [1 ]
机构
[1] Jinhua Polytech, Key Lab Crop Harvesting Equipment Technol Zhejiang, Jinhua 321007, Peoples R China
关键词
Aerial photo; deep feature; matrix factorization; probabilistic model; retargeting; COMMUNITIES; ALGORITHM;
D O I
10.1109/JSTARS.2023.3247745
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Retargeting aims at displaying a photo with an arbitrary aspect ratio, wherein the visually/semantically prominent objects are appropriately preserved and visual distortions can be well alleviated. Conventional retargeting models are built upon the visual perception of photos from a family of prespecified communities (e.g., "portrait"), wherein the underlying community-specific features are not learned explicitly. Thus, they cannot appropriately retarget aerial photos, which contains a rich variety of objects with different scales. In this article, a novel aerial photo retargeting framework is designed by encoding the deep features from automatically detected Google Maps (https://www.google.com/maps) communities into a regularized probabilistic model. Specifically, we first propose an enhanced matrix factorization (MF) algorithm to calculate communities based on million-scale Google Maps pictures, for each of which deep feature is learned simultaneously. The enhanced MF incorporates label denoising, between-communities correlation, and deep feature encoding collaboratively. Subsequently, a probabilistic model called latent topic model (LTM) is designed that quantifies the spatial layouts of multiple Google Maps communities in the underlying hidden space. To alleviate the overfitting from Google Maps communities with imbalanced numbers of aerial photos, a regularizer is added into the LTM. Finally, by leveraging the regularized LTM, we shrink the test photo horizontally/vertically to maximize the posterior probability of the retargted photo. Comprehensive subjective evaluations and visualizations have demonstrated the advantages of our method. Besides, our calculate Google Maps communities are competitively consistent with the ground truth, according to the quantitative comparisons on the 2 M Google Maps photos.
引用
收藏
页码:3495 / 3505
页数:11
相关论文
共 40 条
  • [31] Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition
    Tufekei, Zekeriya
    Gowdy, John N.
    Gurbuz, Sabri
    Patterson, Eric
    SPEECH COMMUNICATION, 2006, 48 (10) : 1294 - 1307
  • [32] A Novel Noise-Robust ASR Method by Applying Partially Connected DNN Model and Mixed-Bandwidth Concept
    Fan, Lichun
    Li, Hongyan
    Ke, Dengfeng
    Xu, Bo
    PROCEEDINGS OF THE 2ND INTERNATIONAL SYMPOSIUM ON COMPUTER, COMMUNICATION, CONTROL AND AUTOMATION, 2013, 68 : 182 - 185
  • [33] Noise-robust pipe wall-thinning discrimination system using convolution recurrent neural network model
    Park, Jaehan
    Yun, Hun
    Im, Jae Seong
    Shin, Soo Young
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [34] A Noise-Robust Blind Deblurring Algorithm With Wavelet-Enhanced Diffusion Model for Optical Remote Sensing Images
    Li, Zhiyuan
    Li, Jie
    Zhang, Yueting
    Guo, Jiayi
    Wu, Yirong
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 16236 - 16254
  • [35] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training
    Meng, Yu
    Zhang, Yunyi
    Huang, Jiaxin
    Wang, Xuan
    Zhang, Yu
    Ji, Heng
    Han, Jiawei
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 10367 - 10378
  • [36] Noise-robust hands-free speech recognition using SIMO-model-based blind source separation
    Mori, Y.
    Takatani, T.
    Saruwatari, H.
    Shikano, K.
    Hiekata, T.
    Morita, T.
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 1290 - +
  • [37] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178
  • [38] COMBINING SPECTRAL FEATURE MAPPING AND MULTI-CHANNEL MODEL-BASED SOURCE SEPARATION FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Bagchi, Deblin
    Mandel, Michael I.
    Wang, Zhongqiu
    He, Yanzhang
    Plummer, Andrew
    Fosler-Lussier, Eric
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 496 - 503
  • [39] A Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature Extraction in Noise-Robust Audiovisual Speech Recognition
    Borgstroem, Bengt Jonas
    Alwan, Abeer
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2008, 38 (06): : 1273 - 1280
  • [40] Noise robust spatially regularized myelin water fraction mapping with the intrinsic B1-error correction based on the linearized version of the extended phase graph model
    Kumar, Dushyant
    Siemonsen, Susanne
    Heesen, Christoph
    Fiehler, Jens
    Sedlacik, Jan
    JOURNAL OF MAGNETIC RESONANCE IMAGING, 2016, 43 (04) : 800 - 817