共 50 条
- [11] BLESS: Benchmarking Large Language Models on Sentence Simplification 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13291 - 13309
- [12] TRAM: Benchmarking Temporal Reasoning for Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 6389 - 6415
- [14] Benchmarking Large Language Models in Retrieval-Augmented Generation THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
- [15] SEED-Bench: Benchmarking Multimodal Large Language Models 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13299 - 13308
- [16] Quantifying Bias in Agentic Large Language Models: A Benchmarking Approach 2024 5TH INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE, ICTC 2024, 2024, : 349 - 353
- [18] Benchmarking Large Language Models on CFLUE - A Chinese Financial Language Understanding Evaluation Dataset FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5673 - 5693
- [19] Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14820 - 14835