共 50 条
- [31] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14743 - 14777
- [33] Benchmarking protein language models for protein crystallization SCIENTIFIC REPORTS, 2025, 15 (01):
- [34] AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1979 - 1998
- [35] MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models BIG DATA MINING AND ANALYTICS, 2024, 7 (04): : 1116 - 1128
- [36] Benchmarking and Evaluating Large Language Models in Phishing Detection for Small and Midsize Enterprises: A Comprehensive Analysis IEEE ACCESS, 2025, 13 : 28335 - 28352
- [37] InteNSE: Interpretability, Robustness, and Benchmarking in Neural Software Engineering (Second Edition: Large Language Models) Proc. - IEEE/ACM Int. Workshop Interpretability, Robust., Benchmarking Neural Softw. Eng. InteNSE, (VI):