A novel approach to measuring the scope of patent claims based on probabilities obtained from (large) language models

被引:0
|
作者
Ragot, Sebastien [1 ]
机构
[1] E Blum & Co Ltd, Patent & Trademark Attorneys VSP, Vorderberg 11, CH-8044 Zurich, Switzerland
关键词
Patent scope; Patent value; Patent claims; Language models; Large language models; GPT; Information theory; Self-information;
D O I
10.1016/j.wpi.2024.102321
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
This work proposes to measure the scope of a patent claim as the reciprocal of self-information contained in this claim. Self-information is calculated based on a probability of occurrence of the claim, where this probability is obtained from a language model. Grounded in information theory, this approach is based on the assumption that an unlikely concept is more informative than a usual concept, insofar as it is more surprising. In turn, the more surprising the information required to define the claim, the narrower its scope. Seven language models are considered, ranging from simplest models (each word or character has an identical probability) to intermediate models (based on average word or character frequencies), to large language models (LLMs) such as GPT2 and davinci-002. Remarkably, when using the simplest language models to compute the probabilities, the scope becomes proportional to the reciprocal of the number of words or characters involved in the claim, a metric already used in previous works. Application is made to multiple series of patent claims directed to distinct inventions, where each series consists of a set of claims having a gradually decreasing scope. The performance of the language models is then assessed through several ad hoc tests. The LLMs outperform models based on word and character frequencies, which themselves outdo the simplest models based on word or character counts. Interestingly, however, the character count appears to be a more reliable indicator than the word count.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] Extracting Fruit Disease Knowledge from Research Papers Based on Large Language Models and Prompt Engineering
    Fei, Yunqiao
    Fan, Jingchao
    Zhou, Guomin
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [32] Reducing Spurious Correlations in Aspect-based Sentiment Analysis with Explanation from Large Language Models
    Wang, Qianlong
    Ding, Keyang
    Liang, Bin
    Yang, Min
    Xu, Ruifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2930 - 2941
  • [33] A Structural-Semantic Approach Integrating Graph-Based and Large Language Models Representation to Detect Android Malware
    Khan, Irshad
    Kwon, Young-Woo
    ICT SYSTEMS SECURITY AND PRIVACY PROTECTION, SEC 2024, 2024, 710 : 279 - 293
  • [34] Lessons from the Large Hadron Collider for model-based experimentation: the concept of a model of data acquisition and the scope of the hierarchy of models
    Koray Karaca
    Synthese, 2018, 195 : 5431 - 5452
  • [36] An Accurate and Efficient Approach to Knowledge Extraction from Scientific Publications Using Structured Ontology Models, Graph Neural Networks, and Large Language Models
    Ivanisenko, Timofey V.
    Demenkov, Pavel S.
    Ivanisenko, Vladimir A.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (21)
  • [37] Extracting phenotypes from clinical descriptions using large language models: a comparison between automated and manual approach.
    Berardelli, Silvia
    Gazzo, Andrea
    De Paoli, Federica
    Limongelli, Ivan
    Rizzo, Ettore
    Magni, Paolo
    Zucca, Susanna
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1630 - 1631
  • [38] A novel approach to feature extraction from classification models based on information gene pairs
    Li, J.
    Tang, X.
    Liu, J.
    Huang, J.
    Wang, Y.
    PATTERN RECOGNITION, 2008, 41 (06) : 1975 - 1984
  • [39] LLMADR: A Novel Method for Adverse Drug Reaction Extraction Based on Style Aligned Large Language Models Fine-Tuning
    Yin, Huazi
    Tang, Jintao
    Li, Shasha
    Wang, Ting
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 470 - 482
  • [40] A Text-Based Predictive Maintenance Approach for Facility Management Requests Utilizing Association Rule Mining and Large Language Models
    Lowin, Maximilian
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (01): : 233 - 258