Academic
Publications
Unsupervised named-entity extraction from the Web: An experimental study

Unsupervised named-entity extraction from the Web: An experimental study,10.1016/j.artint.2005.03.001,Artificial Intelligence,Oren Etzioni,Michael J.

Unsupervised named-entity extraction from the Web: An experimental study   (Citations: 269)
BibTex | RIS | RefWorks Download
The KNOWITALL system aims to automate the tedious process of extracting large col- lections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOW- ITALL's novel architecture and design principles, emphasizing its distinctive ability to ex- tract information without any hand-labeled training examples. In its first major run, K NOW- ITALL extracted over 50,000 class instances, but suggested a challenge: How can we im- prove KNOWITALL's recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their perfor- mance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extractionautomatically identifies sub-classes in order to boost recall (e.g., "chemist" and "biologist" are identified as sub-classes of "scientist"). List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on building lists of named entities, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 8-fold increase in recall at precision of 0.90, and discovered over 10,000 cities missing from the Tipster Gazetteer.
Journal: Artificial Intelligence - AI , vol. 165, no. 1, pp. 91-134, 2005
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...[16, 5], stat-snowball [37], readtheweb [9], and yago-naga [29], as well as commercial endeavors such as wolframalpha.com, freebase.com, and trueknowledge.com...
    • ...[6, 2, 12, 16, 5, 7, 36, 8]) is bootstrapped with seed facts for given relations and automatically iterates, in an almost unsupervised manner, between collecting text patterns that contain facts and finding new fact candidates that co-occur with patterns...

    Ndapandula Nakasholeet al. Scalable knowledge harvesting with high precision and high recall

    • ...Even being supervised, other systems like KnowItAll [15] provide a higher level of automation...
    • ...It is also worth mentioning that, unlike previous approaches [10,15], discovered entities will be associated to classes of an input domain ontology in order to offer more structured and easily interpretable annotations...
    • ...From an unsupervised point-of-view, this can be done by evaluating the degree of similarity between entities and subsumer concepts from the statistical estimation of their co-occurrence [15]...

    David Sánchezet al. Content annotation for the semantic web: An automatic web-based approa...

    • ...The KnowItAll system [7] uses the pointwise mutual information (PMI) of the name and a context discriminator, such as “X is a city” as the features of a Naõ ¨ve Bayes classifier to classify entities into predefined classes...
    • ...Our method in this work is based on pointwise mutual information (PMI) [6], a popular metric for measuring the relatedness of two terms in text corpus, and is adapted by KnowItAll system [7]...
    • ...The findings of the big impact of EDFs, CDFs, FCD measures, and nonlinear classifiers are important contributions of our work, since FCG achieves much better results after considering well these factors, which are not systematically investigated in prior research on corpus statistics methods, such as the works [8], [7]...

    Yanpeng Liet al. A Framework for Semisupervised Feature Generation and Its Applications...

    • ...Bootstrapping-based relation extraction [1,3,4,5,6] leverage large amounts of data on the Web efficiently...
    • ...KnowItAll[4] and Espresso[5] is also bootstrapping-based system, but a different type of pattern evaluation method is used...

    Haibo Liet al. Using Graph Based Method to Improve Bootstrapping Relation Extraction

    • ...example, KnowltAII [28] is a named entity recognition system...
    • ...of each simple noun phrase (NP) in the list NPList2 is a member ofthe class named in NPI [28]...
    • ...of such unsupervised systems (for example [28]) is that it can...

    Pir Abdul Rasool Qureshiet al. LanguageNet: A novel framework for processing unstructured text inform...

Sort by: