A measure of similarity between graph vertices: Applications to synonym extraction and web searching

被引:229
|
作者
Blondel, VD
Gajardo, A
Heymans, M
Senellart, P
Van Dooren, P
机构
[1] Catholic Univ Louvain, Div Appl Math, B-1348 Louvain, Belgium
[2] Univ Concepcion, Dept Ingn Matemat, Concepcion, Chile
[3] Google Inc, Mountain View, CA 94043 USA
[4] Ecole Normale Super, Dept Comp Sci, F-75230 Paris 05, France
关键词
algorithms; graph algorithms; graph theory; eigenvalues of graphs;
D O I
10.1137/S0036144502415960
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We introduce a concept of similarity between vertices of directed graphs. Let CA and G(B) be two directed graphs with, respectively, n(A) and n(B) vertices. We define an n(B) x n(A) similarity matrix S whose real entry s(ij) expresses how similar vertex j (in G(A)) is to vertex i (in G(B)): we say that s(ij) is their similarity score. The similarity matrix can be obtained as the limit of the normalized even iterates of Sk+1 = BS(k)A(T) + B(T)S(k)A, where A and B are adjacency matrices of the graphs and So is a matrix whose entries are all equal to 1. In the special case where G(A) = G(B) = G, the matrix S is square and the score s(ij) is the similarity score between the vertices i and j of G. We point out that Klemberg's "hub and authority" method to identify web-pages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant eigenvector of a nonnegative matrix. Potential applications of our similarity concept are numerous. We illustrate an application for the automatic extraction of synonyms in a monolingual dictionary.
引用
收藏
页码:647 / 666
页数:20
相关论文
共 29 条