Dynamic personalized pagerank in entity-relation graphs

Dynamic personalized pagerank in entity-relation graphs,10.1145/1242572.1242650,Soumen Chakrabarti

Dynamic personalized pagerank in entity-relation graphs   (Citations: 39)
BibTex | RIS | RefWorks Download
Extractors and taggers turn unstructured text into entity- relation (ER) graphs where nodes are entities (email, pa- per, person, conference, company) and edges are relations (wrote, cited, works-for). Typed proximity search of the form type=person NEAR company "IBM", paper "XML" is an increasingly useful search paradigm in ER graphs. Prox- imity search implementations either perform a Pagerank-like computation at query time, which is slow, or precompute, store and combine per-word Pageranks, which can be very expensive in terms of preprocessing time and space. We present HubRank, a new system for fast, dynamic, space- ecient proximity searches in ER graphs. During prepro- cessing, HubRank computes and indexes certain "sketchy" random walk fingerprints for a small fraction of nodes, care- fully chosen using query log statistics. At query time, a small "active" subgraph is identified, bordered by nodes with indexed fingerprints. These fingerprints are adaptively loaded to various resolutions to form approximate person- alized Pagerank vectors (PPVs). PPVs at remaining active nodes are now computed iteratively. We report on experi- ments with CiteSeer's ER graph and millions of real Cite- Seer queries. Some representative numbers follow. On our testbed, HubRank preprocesses and indexes 52 times faster than whole-vocabulary PPV computation. A text index occupies 56MB. Whole-vocabulary PPVs would consume 102GB. If PPVs are truncated to 56MB, precision com- pared to true Pagerank drops to 0.55; in contrast, HubRank has precision 0.91 at 63MB. HubRank's average query time is 200-300 milliseconds; query-time Pagerank computation takes 11 seconds on average.
Conference: World Wide Web Conference Series - WWW , pp. 571-580, 2007
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Sort by: