Author name disambiguation using a graph model with node splitting and merging based on bibliographic information

被引:2
|
作者
Dongwook Shin
Taehwan Kim
Joongmin Choi
Jungsun Kim
机构
[1] Hanyang University,Department of Computer Science and Engineering
来源
Scientometrics | 2014年 / 100卷
关键词
Author name disambiguation; Graph model; Namesake resolution; Heteronymous name resolution; Digital library;
D O I
暂无
中图分类号
学科分类号
摘要
Author ambiguity mainly arises when several different authors express their names in the same way, generally known as the namesake problem, and also when the name of an author is expressed in many different ways, referred to as the heteronymous name problem. These author ambiguity problems have long been an obstacle to efficient information retrieval in digital libraries, causing incorrect identification of authors and impeding correct classification of their publications. It is a nontrivial task to distinguish those authors, especially when there is very limited information about them. In this paper, we propose a graph based approach to author name disambiguation, where a graph model is constructed using the co-author relations, and author ambiguity is resolved by graph operations such as vertex (or node) splitting and merging based on the co-authorship. In our framework, called a Graph Framework for Author Disambiguation (GFAD), the namesake problem is solved by splitting an author vertex involved in multiple cycles of co-authorship, and the heteronymous name problem is handled by merging multiple author vertices having similar names if those vertices are connected to a common vertex. Experiments were carried out with the real DBLP and Arnetminer collections and the performance of GFAD is compared with three representative unsupervised author name disambiguation systems. We confirm that GFAD shows better overall performance from the perspective of representative evaluation metrics. An additional contribution is that we released the refined DBLP collection to the public to facilitate organizing a performance benchmark for future systems on author disambiguation.
引用
收藏
页码:15 / 50
页数:35
相关论文
共 50 条
  • [1] Author name disambiguation using a graph model with node splitting and merging based on bibliographic information
    Shin, Dongwook
    Kim, Taehwan
    Choi, Joongmin
    Kim, Jungsun
    SCIENTOMETRICS, 2014, 100 (01) : 15 - 50
  • [2] Author Name Disambiguation Using Graph Node Embedding Method
    Zhang, Wenjing
    Yan, Zhongmin
    Zheng, Yongqing
    PROCEEDINGS OF THE 2019 IEEE 23RD INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2019, : 410 - 415
  • [3] Author Name Disambiguation Based on Heterogeneous Graph
    Ma, Chuang
    Xia, Helong
    Journal of Computers (Taiwan), 2023, 34 (04) : 41 - 52
  • [4] Whois? Deep Author Name Disambiguation Using Bibliographic Data
    Boukhers, Zeyd
    Asundi, Nagaraj Bahubali
    LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES (TPDL 2022), 2022, 13541 : 201 - 215
  • [5] Using Web Information for Author Name Disambiguation
    Pereira, Denilson Alves
    Ribeiro-Neto, Berthier
    Ziviani, Nivio
    Laender, Alberto H. F.
    Goncalves, Marcos Andre
    Ferreira, Anderson A.
    JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, : 49 - 58
  • [6] Bibliographic Name Disambiguation with Graph Convolutional Network
    Yan, Hao
    Peng, Hao
    Li, Chen
    Li, Jianxin
    Wang, Lihong
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2019, 2019, 11881 : 538 - 551
  • [7] A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory
    Ma, Yingying
    Wu, Youlong
    Lu, Chengqiang
    ENTROPY, 2020, 22 (04)
  • [8] A knowledge graph embeddings based approach for author name disambiguation using literals
    Cristian Santini
    Genet Asefa Gesese
    Silvio Peroni
    Aldo Gangemi
    Harald Sack
    Mehwish Alam
    Scientometrics, 2022, 127 : 4887 - 4912
  • [9] A knowledge graph embeddings based approach for author name disambiguation using literals
    Santini, Cristian
    Gesese, Genet Asefa
    Peroni, Silvio
    Gangemi, Aldo
    Sack, Harald
    Alam, Mehwish
    SCIENTOMETRICS, 2022, 127 (08) : 4887 - 4912
  • [10] LUCID: Author Name Disambiguation using Graph Structural Clustering
    Hussain, Ijaz
    Asghar, Sohail
    PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 406 - 413