共 50 条
- [41] The Two Word Test as a semantic benchmark for large language models SCIENTIFIC REPORTS, 2024, 14 (01):
- [42] Establishing vocabulary tests as a benchmark for evaluating large language models PLOS ONE, 2024, 19 (12):
- [43] Why Personalized Large Language Models Fail to Do What Ethics is All About AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 60 - 63
- [44] Leveraging large language models for predictive chemistry Nature Machine Intelligence, 2024, 6 : 161 - 169
- [45] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6449 - 6464
- [49] What Do Language Models Hear? Probing for Auditory Representations in Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5435 - 5448
- [50] Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3299 - 3321