Academic
Publications
Avatar Information Extraction System

Avatar Information Extraction System,IEEE Data(base) Engineering Bulletin,T. S. Jayram,Rajasekar Krishnamurthy,Sriram Raghavan,Shivakumar Vaithyanatha

Avatar Information Extraction System   (Citations: 54)
BibTex | RIS | RefWorks Download
The AVATAR Information Extraction System (IES) at the IBM Almaden Research Center enables high- precision, rule-based, information extraction from text-documents. Draw ing from our experience we propose the use of probabilistic database techniques as the formal under pinnings of information extrac- tion systems so as to maintain high precision while increasing recall. This involve s building a frame- work where rule-based annotators can be mapped to queries in a databas e system. We use examples from AVATAR IES to describe the challenges in achieving this goal. Finally, we show that derivin g precision estimates in such a database system presents a significant challe nge for probabilistic database systems.
Journal: IEEE Data(base) Engineering Bulletin - DEBU , vol. 29, no. 1, pp. 40-48, 2006
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...These include information retrieval [21], data integration and cleaning [2,17], text analytics [25,31], social network analysis [1], sensor data management [11,16], financial applications, biological and scientific data management etc...

    Jian Liet al. A unified approach to ranking in probabilistic databases

    • ...In structured information extractors, condence values are appended to rules for extracting patterns from unstructured data [28]...

    Liwen Sunet al. Mining uncertain data with probabilistic guarantees

    • ...Large amounts of correlated probabilistic data are being generated at a rapidly increasing pace in a wide variety of application domains, including data integration [9], information extraction [15, 13], RFID/sensor network applications [24, 16] and other applications which use machine learning techniques [23] for reasoning over large datasets [7]...
    • ...Information Extraction/Integration: Consider an information extraction/integration system [13, 15, 21, 9] that scans used car advertisements from multiple dierent sources, such as cars.com, craigslist.com, and autotrader.com and populates a relational database with structured entities (Figure 1). To cope with the enormous amounts of data on the web, the system employs automatic extractors to detect potential tuples...
    • ...Since most web data is in natural language format, incorrect tuples might also be extracted; hence machine learning algorithms based on CRFs [13] and Bayesian inference techniques [15] are used to assign probabilities of correctness/existence to the extracted tuples...
    • ...The clique nodes correspond to maximal cliques in thetriangulatedPGM and the separator nodes correspond to the cut vertices that separate the maximal cliques [15]...

    Bhargav Kanagalet al. Lineage processing over correlated probabilistic databases

    • ...In structured information extractors, condence values are appended to rules for extracting patterns from unstructured data [26]...

    Liang Wanget al. Accelerating probabilistic frequent itemset mining: a model-based appr...

    • ...In part, this has been due to the increasing prevalence of applications such as information retrieval [15], data integration and cleaning [2, 11], text analytics [23, 19], and social network analysis [1], where uncertainty arises both because of noisy input data, and because of the statistical inference typically performed on such data...

    Jian Liet al. A Unified Approach to Ranking in Probabilistic Databases

Sort by: