Linguistic Summarization using a Weighted N-gram Language Model based on the Similarity of Time-series Data

被引:0
|
作者
Aoki, Kasumi [1 ]
Kobayashi, Ichiro [2 ]
机构
[1] Ochanomizu Univ, Fac Sci, Dept Informat Sci, Tokyo, Japan
[2] Ochanomizu Univ, Grad Sch Humanities & Sci, Adv Sci, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a method to verbalize the trends of time-series data. As an example of time-series data, we use the price of Nikkei stock average and develop a method to generate natural language sentences which describe how the stock price goes in the market. As the basic idea for making linguistic descriptions of the stock price trends, we firstly classify all the time-series data including a newly observed time-series data, i.e., the target to be verbalized, by means of spectral clustering employing Dynamic Time Warping distance as its similarity metric. Secondly, a bi-gram language model for the newly observed data is built based on the weighted bi-gram language models of the other time-series data classified in the same cluster. The weights for the bi-gram model of the target data from other time-series data are decided based on the similarity between the target data and the other data in the same cluster. Lastly, linguistic summarization for the target data is generated by finding the most likely combination of words by means of dynamic programming, employing the weighted bi-gram model. Through the experiments under the conditions of various cluster numbers in spectral clustering, we have confirmed that natural language sentences, which properly describe the trends of the stock price, are generated by our method.
引用
收藏
页码:595 / 601
页数:7
相关论文
共 50 条
  • [1] Symbolic Translation of Time Series using Piecewise N-gram Similarity Voting
    Delannoy, Siegfried
    Caillault, Emilie
    Bigand, Andre
    Rousseeuw, Kevin
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 327 - 333
  • [2] A WEIGHTED AVERAGE N-GRAM MODEL OF NATURAL-LANGUAGE
    OBOYLE, P
    OWENS, M
    SMITH, FJ
    COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04): : 337 - 349
  • [3] An Approach to Linguistic Summarization based on Comparison among Multiple Time-series Data
    Kobayashi, Mizuki
    Kobayashi, Ichiro
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 1100 - 1103
  • [4] Blind data linkage using n-gram similarity comparisons
    Churches, T
    Christen, P
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 121 - 126
  • [5] Splitting input for machine translation using N-gram language model together with utterance similarity
    Doi, T
    Sumita, E
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (06): : 1256 - 1264
  • [6] SPANISH LINGUISTIC STEGANOGRAPHY BASED ON N-GRAM MODEL AND ZIPF LAW
    Munoz Munoz, Alfonso
    Argueelles Alvarez, Irina
    ARBOR-CIENCIA PENSAMIENTO Y CULTURA, 2014, 190 (768)
  • [7] Discovery of Corrosion Patterns using Symbolic Time Series Representation and N-gram Model
    Taib, Shakirah Mohd
    Zabidi, Zahiah Akhma Mohd
    Aziz, Izzatdin Abdul
    Mousor, Farahida Hanim
    Abu Bakar, Azuraliza
    Mokhtar, Ainul Akmar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (12) : 554 - 560
  • [8] UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 857 - 860
  • [9] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
  • [10] Linguistic Summarization of Time Series Data using Genetic Algorithms
    Castillo-Ortega, Rita
    Marin, Nicolas
    Sanchez, Daniel
    Tettamanzi, Andrea G. B.
    PROCEEDINGS OF THE 7TH CONFERENCE OF THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY (EUSFLAT-2011) AND LFA-2011, 2011, : 416 - 423