Modeling and predicting the popularity of online contents with Cox proportional hazard regression model

被引:52
作者
Lee, Jong Gun [1 ]
Moon, Sue [2 ]
Salamatian, Kave [3 ]
机构
[1] France Telecom Orange Labs, SENSE Sociol & Econ Networks & Serv Lab, F-92130 Issy Les Moulineaux, France
[2] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
[3] LISTIC Polytech Annecy Chambery, F-74944 Annecy Le Vieux, France
关键词
Popularity of online contents; Survival analysis; Cox proportional hazard regression model;
D O I
10.1016/j.neucom.2011.04.040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a general framework which can be used for modeling and predicting the popularity of online contents. The aim of our modeling is not inferring the precise popularity value of a content, but inferring the likelihood with which the content will be popular. Our approach is rooted in survival analysis which deals with the survival time until an event of a failure or death. Survival analysis assumes that predicting the precise lifetime of an instance is very hard but predicting the likelihood of the lifetime of an instance is possible based on its hazard distribution. Additionally we position ourselves in the standpoint of an external observer who has to model the popularity of contents only with publicly available information. Thus, the goal of our proposed methodology is to model a certain popularity metric, such as the lifetime of a content and the number of comments which a content receives, with a set of explanatory factors, which are observable by the external observer. Among various parametric and non-parametric approaches for the survival analysis, we use the Cox proportional hazard regression model, which divides the distribution function of a certain popularity metric into two components: one which is explained by a set of explanatory factors, called risk factors, and another, a baseline survival distribution function, which integrates all the factors not taken into account. In order to validate our proposed methodology, we use two datasets crawled from two different discussion forums, forum. dpreview.com and forums.myspace.com, which are one of the largest discussion forum dealing various issues on digital cameras and a discussion forum provided by a representative social networks. We model two difference popularity metrics, the lifetime of threads and the number of comments, and we show that the models can predict the lifetime of threads from Dpreview (Myspace) by observing a thread during the first 5-6 days (24 h, respectively) and the number of comments of Dpreview threads by observing a thread during first 2-3 days. (C) 2011 Published by Elsevier B.V.
引用
收藏
页码:134 / 145
页数:12
相关论文
共 27 条
[1]  
Agarwal N., 2008, P 2008 INT C WEB SEA, P207
[2]  
[Anonymous], P INT C WEB INF SYST
[3]  
[Anonymous], 2009, WWW 09 P 18 INT WORL, DOI DOI 10.1145/1526709.1526806
[4]  
[Anonymous], HOMOGENEOUS TEMPORAL
[5]  
[Anonymous], 2008, P 17 INT C WORLD WID
[6]  
Beibei Li, 2007, Proceedings of the 45th ACM Southeast Conference. ACMSE 07, P94
[7]  
COX DR, 1972, J R STAT SOC B, V34, P187
[8]   Robust dynamic classes revealed by measuring the response function of a social system [J].
Crane, Riley ;
Sornette, Didier .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (41) :15649-15653
[9]   Social status and aggression: A field study analyzed by survival analysis [J].
Diekmann, A ;
JungbauerGans, M ;
Krassnig, H ;
Lorenz, S .
JOURNAL OF SOCIAL PSYCHOLOGY, 1996, 136 (06) :761-768
[10]  
Fan JQ, 2002, ANN STAT, V30, P74