Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models

被引:46
作者
Ajiferuke, Isola [1 ]
Famoye, Felix [2 ]
机构
[1] Univ Western Ontario, Fac Informat & Media Studies, London, ON N6A 5B7, Canada
[2] Cent Michigan Univ, Dept Math, Mt Pleasant, MI 48859 USA
关键词
Count response variable; Linear regression model; Count regression models; Negative binomial regression model; Lognormal regression model; Informetric studies; GENERALIZED POISSON REGRESSION; CITATION COUNTS; QUANTITATIVE CHARACTERISTICS; IMPACT; JOURNALS; ARTICLES; DETERMINANTS; DOWNLOADS; PREDICTION; FREQUENCY;
D O I
10.1016/j.joi.2015.05.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The purpose of the study is to compare the performance of count regression models to those of linear and lognormal regression models in modelling count response variables in informetric studies. Identified count response variables in informetric studies include the number of authors, the number of references, the number of views, the number of downloads, and the number of citations received by an article. Also of a count nature are the number of links from and to a website. Data were collected from the United States Patent and Trademark Office (www.usptagov), an open access journal (wwwinformationr.netiirJ), Web of Science, and Maclean's magazine. The datasets were then used to compare the performance of linear and lognormal regression models with those of Poisson, negative binomial, and generalized Poisson regression models. It was found that due to overdispersion in most response variables, the negative binomial regression model often seems to be more appropriate for informetric datasets than the Poisson and generalized Poisson regression models. Also, the regression analyses showed that linear regression model predicted some negative values for five of the nine response variables modelled, and for all the response variables, it performed worse than both the negative binomial and lognormal regression models when either Akaike's Information Criterion (AIC) or Bayesian Information Criterion (BIC) was used as the measure of goodness of fit statistics. The negative binomial regression model performed significantly better than the lognormal regression model for four of the response variables while the lognormal regression model performed significantly better than the negative binomial regression model for two of the response variables but there was no significant difference in the performance of the two models for the remaining three response variables. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:499 / 513
页数:15
相关论文
共 54 条
[1]   Use of Poisson spatiotemporal regression models for the Brazilian Amazon forest: malaria count data [J].
Achcar, Jorge Alberto ;
Martinez, Edson Zangiacomi ;
Pires de Souza, Aparecida Doniseti ;
Tachibana, Vilma Mayumi ;
Flores, Edilson Ferreira .
REVISTA DA SOCIEDADE BRASILEIRA DE MEDICINA TROPICAL, 2011, 44 (06) :749-754
[2]  
Ajiferuke I, 2005, CAN J INFORM LIB SCI, V29, P407
[3]  
Association of Universities and Colleges of Canada, 2014, DIR CAN U
[4]   Determinants of citations to articles in elite law reviews [J].
Ayres, I ;
Vars, FE .
JOURNAL OF LEGAL STUDIES, 2000, 29 (01) :427-450
[5]   Crossing the hurdle: the determinants of individual scientific performance [J].
Baccini, A. ;
Barabesi, L. ;
Cioni, M. ;
Pisani, C. .
SCIENTOMETRICS, 2014, 101 (03) :2035-2062
[6]   Multiple publication on a single research study: Does it pay? The influence of number of research articles on total citation counts in biomedicine [J].
Bornmann, Lutz ;
Daniel, Hans-Dieter .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (08) :1100-1107
[7]   What factors determine citation counts of publications in chemistry besides their quality? [J].
Bornmann, Lutz ;
Schier, Hermann ;
Marx, Werner ;
Daniel, Hans-Dieter .
JOURNAL OF INFORMETRICS, 2012, 6 (01) :11-18
[8]   Non-alphanumeric characters in titles of scientific publications: An analysis of their occurrence and correlation with citation impact [J].
Buter, R. K. ;
van Raan, A. F. J. .
JOURNAL OF INFORMETRICS, 2011, 5 (04) :608-617
[9]   Predictive Effects of Structural Variation on Citation Counts [J].
Chen, Chaomei .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (03) :431-449
[10]   Non nested model selection for spatial count regression models with application to health insurance [J].
Czado, Claudia ;
Schabenberger, Holger ;
Erhardt, Vinzenz .
STATISTICAL PAPERS, 2014, 55 (02) :455-476