Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry

被引：146

作者：

Kobis, Nils ^{[1
,2
,3
]}

Mossink, Luca D. ^{[1
,2
]}

机构：

[1] Univ Amsterdam, Dept Econ, Amsterdam, Netherlands

[2] Univ Amsterdam, Ctr Expt Econ & Polit Decis Making CREED, Amsterdam, Netherlands

[3] Max Planck Inst Human Dev, Ctr Humans & Machines, Berlin, Germany

来源：

COMPUTERS IN HUMAN BEHAVIOR | 2021年 / 114卷

基金：

欧洲研究理事会;

关键词：

Natural language generation; Computational creativity; Turing; Test; Creativity; Machine behavior; ACCOUNTABILITY; TRANSPARENCY; PSYCHOLOGY; ALGORITHMS;

D O I：

10.1016/j.chb.2020.106553

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

The release of openly available, robust natural language generation algorithms (NLG) has spurred much public attention and debate. One reason lies in the algorithms' purported ability to generate humanlike text across various domains. Empirical evidence using incentivized tasks to assess whether people (a) can distinguish and (b) prefer algorithm-generated versus human-written text is lacking. We conducted two experiments assessing behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using the identical starting lines of human poems, GPT-2 produced samples of poems. From these samples, either a random poem was chosen (Human-out-of-theloop) or the best one was selected (Human-in-the-loop) and in turn matched with a human-written poem. In a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmically generated poems in the Human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, people reveal a slight aversion to algorithm-generated poetry, independent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity). We discuss what these results convey about the performance of NLG algorithms to produce human-like text and propose methodologies to study such learning algorithms in human-agent experimental settings.

引用

页数：13

共 4 条

[1] AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably
Porter, Brian
Machery, Edouard
SCIENTIFIC REPORTS, 2024, 14 (01):
[2] Quality and correctness of AI-generated versus human-written abstracts in psychiatric research papers
Hsu, Tien-Wei
Tseng, Ping-Tao
Tsai, Shih-Jen
Ko, Chih-Hung
Thompson, Trevor
Hsu, Chih-Wei
Yang, Fu-Chi
Tsai, Chia-Kuang
Tu, Yu-Kang
Yang, Szu-Nian
Liang, Chih-Sung
Su, Kuan-Pin
PSYCHIATRY RESEARCH, 2024, 341
[3] A Comparison of Human-Written Versus AI-Generated Text in Discussions at Educational Settings: Investigating Features for ChatGPT, Gemini and BingAI
Durak, Hatice Yildiz
Egin, Figen
Onan, Aytug
EUROPEAN JOURNAL OF EDUCATION, 2025, 60 (01)
[4] Detecting Artificial Intelligence-Generated Versus Human-Written Medical Student Essays: Semirandomized Controlled Study
Doru, Berin
Maier, Christoph
Busse, Johanna Sophie
Luecke, Thomas
Schoenhoff, Judith
Enax-Krumova, Elena
Hessler, Steffen
Berger, Maria
Tokic, Marianne
JMIR MEDICAL EDUCATION, 2025, 11

← 1 →