共 50 条
- [32] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6449 - 6464
- [33] Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 745 - 748
- [34] GRASP: A Novel Benchmark for Evaluating Language GRounding and Situated Physics Understanding in Multimodal Language Models PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 6297 - 6305
- [36] ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024, : 1895 - 1906
- [37] VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [39] Transition movement models for large vocabulary continuous sign language recognition SIXTH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, PROCEEDINGS, 2004, : 553 - 558
- [40] Towards a benchmark dataset for large language models in the context of process automation DIGITAL CHEMICAL ENGINEERING, 2024, 13