共 50 条
- [31] Baby steps in evaluating the capacities of large language models NATURE REVIEWS PSYCHOLOGY, 2023, 2 (08): : 451 - 452
- [32] Evaluating the ability of large language models to emulate personality SCIENTIFIC REPORTS, 2025, 15 (01):
- [33] Evaluating Large Language Models on Controlled Generation Tasks 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168
- [34] Baby steps in evaluating the capacities of large language models Nature Reviews Psychology, 2023, 2 : 451 - 452
- [35] EconNLI: Evaluating Large Language Models on Economics Reasoning FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 982 - 994
- [36] Evaluating Large Language Models for Tax Law Reasoning INTELLIGENT SYSTEMS, BRACIS 2024, PT I, 2025, 15412 : 460 - 474
- [37] A Chinese Dataset for Evaluating the Safeguards in Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3106 - 3119
- [40] DebugBench: Evaluating Debugging Capability of Large Language Models Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, : 4173 - 4198