In this paper, we consider probabilistic context-free grammars, a class of generative devices that has been successfully exploited in several applications of syntactic pattern matching, especially in statistical natural language parsing. We investigate the problem of training probabilistic context-free grammars on the basis of distributions defined over an infinite set of trees or an infinite set of sentences by minimizing the cross-entropy. This problem has applications in cases of context-free approximation of distributions generated by more expressive statistical models. We show several interesting theoretical properties of probabilistic context-free grammars that are estimated in this way, including the previously unknown equivalence between the grammar cross-entropy with the input distribution and the so-called derivational entropy of the grammar itself. We discuss important consequences of these results involving the standard application of the maximum-likelihood estimator on finite tree and sentence samples, as well as other finite-state models such as Hidden Markov Models and probabilistic finite automata.
机构:
United Arab Emirates Univ, Coll Informat Technol, Dept Comp Sci & Software Engn, Al Ain 15551, U Arab EmiratesUnited Arab Emirates Univ, Coll Informat Technol, Dept Comp Sci & Software Engn, Al Ain 15551, U Arab Emirates
Turaev, Sherzod
论文数: 引用数:
h-index:
机构:
Abdulghafor, Rawad
论文数: 引用数:
h-index:
机构:
Alwan, Ali Amer
Abd Almisreb, Ali
论文数: 0引用数: 0
h-index: 0
机构:
Int Univ Sarajevo, Fac Engn & Nat Sci, Sarajevo 71210, Bosnia & HercegUnited Arab Emirates Univ, Coll Informat Technol, Dept Comp Sci & Software Engn, Al Ain 15551, U Arab Emirates
机构:
Univ Bordeaux, CNRS, UMR 5800, LaBRI, 351 Cours Liberat, F-33405 Talence, FranceUniv Bordeaux, CNRS, UMR 5800, LaBRI, 351 Cours Liberat, F-33405 Talence, France
Bauderon, Michel
Chen, Rui
论文数: 0引用数: 0
h-index: 0
机构:
Univ Bordeaux, CNRS, UMR 5800, LaBRI, 351 Cours Liberat, F-33405 Talence, FranceUniv Bordeaux, CNRS, UMR 5800, LaBRI, 351 Cours Liberat, F-33405 Talence, France
Chen, Rui
Ly, Olivier
论文数: 0引用数: 0
h-index: 0
机构:
Univ Bordeaux, CNRS, UMR 5800, LaBRI, 351 Cours Liberat, F-33405 Talence, FranceUniv Bordeaux, CNRS, UMR 5800, LaBRI, 351 Cours Liberat, F-33405 Talence, France