共 50 条
- [34] SafetyBench: Evaluating the Safety of Large Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15537 - 15553
- [38] EVALUATING LARGE LANGUAGE MODELS ON THEIR ACCURACY AND COMPLETENESS RETINA-THE JOURNAL OF RETINAL AND VITREOUS DISEASES, 2025, 45 (01): : 128 - 132
- [39] Evaluating Intelligence and Knowledge in Large Language Models TOPOI-AN INTERNATIONAL REVIEW OF PHILOSOPHY, 2025, 44 (01): : 163 - 173