This study compares various F1-score variants-micro, macro, and weighted-to assess their performance in evaluating text-based emotion classification. Lexicon distillation is employed using the multilabel emotion-annotated datasets XED and GoEmotions. The aim of this paper is to understand when each F1-score variant is better suited for evaluating text-based multilabel emotion classification. Unigram lexicons were derived from the annotated GoEmotions and XED datasets through a binary classification approach. The distilled lexicons were then applied to the GoEmotions and XED annotated datasets to calculate their emotional content, and the results were compared. The findings highlight the behavior of each F1-score variant under different class distributions, emphasizing the importance of appropriate metric selection for reliable model performance evaluation in imbalanced multilabel datasets. Additionally, this study also investigates the effect of the aggregation of negative emotions into broader categories on said F1 metrics. The contribution of this study is to provide insights into how different F1-score variants could improve the reliability of multilabel emotion classifier evaluation, particularly in the context of class imbalance present in the case of phishing emails.
机构:
Osaka City Univ, Grad Sch Med, Dept Med Stat, Osaka, Japan
Hyogo Coll Med, Dept Biostat, Nishinomiya, Hyogo, JapanOsaka City Univ, Grad Sch Med, Dept Med Stat, Osaka, Japan
Takahashi, Kanae
Yamamoto, Kouji
论文数: 0引用数: 0
h-index: 0
机构:
Yokohama City Univ, Sch Med, Dept Biostat, Yokohama, Kanagawa, JapanOsaka City Univ, Grad Sch Med, Dept Med Stat, Osaka, Japan
Yamamoto, Kouji
论文数: 引用数:
h-index:
机构:
Kuchiba, Aya
Koyama, Tatsuki
论文数: 0引用数: 0
h-index: 0
机构:
Vanderbilt Univ, Med Ctr, Dept Biostat, Nashville, TN 37232 USAOsaka City Univ, Grad Sch Med, Dept Med Stat, Osaka, Japan