Building Statistical Language Models of Code

被引:0
|
作者
Schulam, Peter [1 ]
Rosenfeld, Roni [1 ]
Devanbu, Premkumar [2 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Univ Calif Davis, Dept Comp Sci, Davis, CA USA
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present the Source Code Statistical Language Model data analysis pattern. Statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data analysis pattern, we describe the process of building n-gram language models over software source files. We hope that by introducing the empirical software engineering community to best practices that have been established over the years in research for natural languages, statistical language models can become a tool that SE researchers are able to use to explore new research directions.
引用
收藏
页码:1 / 3
页数:3
相关论文
共 50 条
  • [31] Predicting reading difficulty with statistical language models
    Collins-Thompson, K
    Callan, J
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (13): : 1448 - 1462
  • [32] Statistical language models for topographic data recognition
    Winstanley, A
    Salaik, B
    Keyes, L
    IGARSS 2003: IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS I - VII, PROCEEDINGS: LEARNING FROM EARTH'S SHAPES AND SIZES, 2003, : 1808 - 1810
  • [33] Fertility models for statistical natural language understanding
    Della Pietra, S
    Epstein, M
    Roukos, S
    Ward, T
    35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 168 - 173
  • [34] IgboBERT Models: Building and Training Transformer Models for the Igbo Language
    Chukwuneke, Chiamaka
    Ezeani, Ignatius
    Rayson, Paul
    El-Haj, Mahmoud
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5114 - 5122
  • [35] Impact of Code Language Models on Automated Program Repair
    Jiang, Nan
    Liu, Kevin
    Lutellier, Thibaud
    Tan, Lin
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 1430 - 1442
  • [36] Code Comprehension: Review and Large Language Models Exploration
    Cui, Jielun
    Zhao, Yutong
    Yu, Chong
    Huang, Jiaqi
    Wu, Yuanyuan
    Zhao, Yu
    2024 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE, SEAI 2024, 2024, : 183 - 187
  • [37] Can large language models generate geospatial code?
    State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
    不详
    arXiv, 1600,
  • [38] Structural language models for any-code generation
    Alon, Uri
    Sadaka, Roy
    Levy, Omer
    Yahav, Eran
    arXiv, 2019,
  • [39] A Teaching Language for Building Object Detection Models
    Sultanum, Nicole
    Ghorashi, Soroush
    Meek, Christopher
    Ramos, Gonzalo
    PROCEEDINGS OF THE 2020 ACM DESIGNING INTERACTIVE SYSTEMS CONFERENCE (DIS 2020), 2020, : 1223 - 1234
  • [40] Building social cognitive models of language change
    Hruschka, Daniel J.
    Christiansen, Morten H.
    Blythe, Richard A.
    Croft, William
    Heggarty, Paul
    Mufwene, Salikoko S.
    Pierrehumbert, Janet B.
    Poplack, Shana
    TRENDS IN COGNITIVE SCIENCES, 2009, 13 (11) : 464 - 469