Academic
Publications
StatSnowball: a statistical approach to extracting entity relationships

StatSnowball: a statistical approach to extracting entity relationships,10.1145/1526709.1526724,Jun Zhu,Zaiqing Nie,Xiaojiang Liu,Bo Zhang,Ji-rong Wen

StatSnowball: a statistical approach to extracting entity relationships   (Citations: 16)
BibTex | RIS | RefWorks Download
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Boot- strapping systems significantly reduce the number of train- ing examples, but they usually apply heuristic-based meth- ods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Further- more, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify var- ious types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a boot- strapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic net- works (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an ℓ1-norm penalized maximum like- lihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during it- erations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it.
Conference: World Wide Web Conference Series - WWW , pp. 101-110, 2009
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...[16, 5], stat-snowball [37], readtheweb [9], and yago-naga [29], as well as commercial endeavors such as wolframalpha.com, freebase.com, and trueknowledge.com...
    • ...Reasoning-enhanced systems (e.g., [30, 37, 9]) check the plausibility of the extracted fact candidates by their mutual consistency based on specified logical constraints...
    • ...[23, 15, 11, 37, 9, 24]) aim to overcome these problems by incorporating probabilistically weighted rules for coupling the random variables that denote whether fact candidates are true or false, and then using joint inference over all fact candidates together...

    Ndapandula Nakasholeet al. Scalable knowledge harvesting with high precision and high recall

    • ...Bootstrapping-based relation extraction [1,3,4,5,6] leverage large amounts of data on the Web efficiently...
    • ...The SatSnowball [6] extends the Snowball using statistical methods and extracts entity pairs and keywords around the entities...

    Haibo Liet al. Using Graph Based Method to Improve Bootstrapping Relation Extraction

    • ...In object-level search engines such as Renlifang 1 [38], it is particularly important to mine entity relations from the Web to build automatically, an entity relation graph to link all the extracted information together...
    • ...The proposed method (PROP) is compared against previous works on Open IE using the micro-average precision, recall and F -scores, as shown in Table 5. In Table 5, O-NB is the naive Bayes relation classifier described in [4], O-CRF is the conditional random field-based Open IE system described in [5], and MLN is the Markov logic network-based Open IE system described in [38]...
    • ...Bootstrapping methods [1, 9, 15, 25, 38] to relation extraction are attractive because they require markedly fewer training instances than supervised approaches do. Bootstrapping methods are initialized with a few instances (often referred to as seeds) of the target relation [1, 25, 38] or general extraction templates [15]...
    • ...Bootstrapping methods [1, 9, 15, 25, 38] to relation extraction are attractive because they require markedly fewer training instances than supervised approaches do. Bootstrapping methods are initialized with a few instances (often referred to as seeds) of the target relation [1, 25, 38] or general extraction templates [15]...

    Danushka Tarupathi Bollegalaet al. Relational duality: unsupervised extraction of semantic relations betw...

    • ...For example, “George W. Bush” is a popular object in the dataset of Wikipedia, and it constitutes a relation between “Junichiro Koizumi” and “Condoleezza Rice.” Zhu et al. [20] extract explicit relations between pairs of people from the Web...

    Xinpeng Zhanget al. Analysis of Implicit Relations on Wikipedia: Measuring Strength throug...

    • ...Notable endeavors along these lines of a “Semantic Wikipedia” and a “machine-readable Web” include academic projects such as DBpedia [3], YAGO [21], Text2Onto [9], sindice/sig.ma [23], KnowItAll/TextRunner [13], IntelligenceInWikipedia [25], ReadTheWeb [6], Omnivore [5], StatSnowball [28], or DBLife [11], and also commercial endeavors like freebase.com, trueknowledge.com,wolframalpha.com,www.google.com/squared, or ...
    • ...Learning and reasoning-based methods include the StatSnowball [28], a powerful machinery for fact harvesting that makes intensive use of Markov Logic Networks (MLNs) and Conditional Random Fields (CRFs)...

    Ndapandula Nakasholeet al. Find your Advisor: Robust Knowledge Gathering from the Web

Sort by: