Building Statistical Language Models of Code

被引:0
|
作者
Schulam, Peter [1 ]
Rosenfeld, Roni [1 ]
Devanbu, Premkumar [2 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Univ Calif Davis, Dept Comp Sci, Davis, CA USA
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present the Source Code Statistical Language Model data analysis pattern. Statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data analysis pattern, we describe the process of building n-gram language models over software source files. We hope that by introducing the empirical software engineering community to best practices that have been established over the years in research for natural languages, statistical language models can become a tool that SE researchers are able to use to explore new research directions.
引用
收藏
页码:1 / 3
页数:3
相关论文
共 50 条
  • [21] Large Language Models Demonstrate the Potential of Statistical Learning in Language
    Contreras Kallens, Pablo
    Kristensen-McLachlan, Ross Deans
    Christiansen, Morten H.
    COGNITIVE SCIENCE, 2023, 47 (03) : e13256
  • [22] Building optimal statistical deformable surface models
    Horkaew, P
    Merrifield, R
    Yang, GZ
    ITAB 2003: 4TH INTERNATIONAL IEEE EMBS SPECIAL TOPIC CONFERENCE ON INFORMATION TECHNOLOGY APPLICATIONS IN BIOMEDICINE, CONFERENCE PROCEEDINGS: NEW SOLUTIONS FOR NEW CHALLENGES, 2003, : 215 - 218
  • [23] Building statistical models to analyze species distributions
    Latimer, AM
    Wu, SS
    Gelfand, AE
    Silander, JA
    ECOLOGICAL APPLICATIONS, 2006, 16 (01) : 33 - 50
  • [24] Statistical models as building blocks of neural networks
    Ciampi, A
    Lechevallier, Y
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1997, 26 (04) : 991 - 1009
  • [25] Automating Baseline Models for Code Compliance with Energy Conservation Building Code of India
    Shukla, Nikunj
    Bhatnagar, Mayank
    Varma, Piyush
    Ahmad, Hisham
    Monga, Gurneet Singh
    Tathagat, Tanmay
    Biswas, Anurag
    Jain, Robin
    PROCEEDINGS OF BUILDING SIMULATION 2019: 16TH CONFERENCE OF IBPSA, 2020, : 4048 - 4052
  • [26] Incorporating linguistic structure into statistical language models
    Rosenfeld, R
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2000, 358 (1769): : 1311 - 1324
  • [27] Statistical Knowledge Assessment for Large Language Models
    Dong, Qingxiu
    Xu, Jingjing
    Kong, Lingpeng
    Sui, Zhifang
    Li, Lei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] Statistical language models for intelligent XML retrieval
    Hiemstra, D
    INTELLIGENT SEARCH ON XML DATA: APPLICATIONS, LANGUAGES, MODELS IMPLEMENTATIONS AND BENCHMARKS, 2003, 2818 : 107 - 118
  • [29] An extended clustering algorithm for statistical language models
    Ueberla, JP
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04): : 313 - 316
  • [30] Generalized algorithms for constructing statistical language models
    Allauzen, C
    Mohri, M
    Roark, B
    41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 40 - 47