EntityBases: Compiling, Organizing and Querying Massive Entity Repositories
Craig A. Knoblock, José Luis Ambite, Kavita Ganesan, Maria Muslea, Steven Minton, Greg Barish, Evan Gamble, Claude Nanjo, Kane See, Cyrus Shahabi, Ching-chien Chen
The current approaches for linking information across sources, often called record linkage, require finding common attributes between the sources and comparing the records using those attributes. This often leads to unsatis- factory results because the sources are often missing in- formation or contain incorrect or outdated information. We are addressing this problem by developing the technology to build massive entity knowledgebases, which we call EntityBases. The key idea is to create a comprehensive knowledgebase for the entities of interest (e.g., companies). In order to build such a knowledge base, we must address the issues of linking entities with multi-valued attributes obtained from heterogeneous sources and providing a virtual repository that can be efficiently queried. This paper describes how we have addressed these issues and shows how an EntityBase™ can be used for understanding and linking text documents.