Academic
Publications
Extracting Patterns and Relations from the World Wide Web
Extracting Patterns and Relations from the World Wide Web   (Citations: 353)
BibTex | RIS | RefWorks Download
The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting ...
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...In a pattern-based system (e.g., [6, 2, 5, 8]), seed facts like (Germany, FIFA_World_Cup), as an instance of the teamWonTrophy relation between soccer teams and trophies (which was true in 1974 and 1990), can automatically detect textual patterns like “X won the final and became the Y champion” which in turn can discover new facts such as (Spain, FIFA_World_Cup) (which is true for 2010)...
    • ...[6, 2, 12, 16, 5, 7, 36, 8]) is bootstrapped with seed facts for given relations and automatically iterates, in an almost unsupervised manner, between collecting text patterns that contain facts and finding new fact candidates that co-occur with patterns...
    • ...While such feedback loops are well studied for knowledge-harvesting methods that are exclusively pattern-based, such as [6, 2, 5], our approach distinguishes itself from that previous work by including the reasoning phase in each iteration...

    Ndapandula Nakasholeet al. Scalable knowledge harvesting with high precision and high recall

    • ...[13,22,26,40,45], semi-supervised methods, e.g. [1,8], and unsupervised method, e.g...

    Cristina Giannoneet al. Supervised semantic relation mining from linguistically noisy text doc...

    • ...NLP techniques have been used mainly for named-entity tagging of fixed number of classes or for question-answering of specific question types [39, 11, 22, 40, 8]. Wrapper induction tools generate delimiter-based rules derived from training data [9, 14, 24, 7]. The generated wrappers usually heavily rely on the HTML encoding present in the training data...
    • ...dependent on HTML tagging from training corpus) Brin in [9] introduced an algorithm to extract simple relations from the Web that are similar to a small “training set” of pairs (e.g...

    Michael Gubanovet al. READFAST: Browsing large documents through unified famous objects (UFO...

    • ...These systems typically build on the paradigm of bootstrapping of entity pairs and patterns as proposed by Brin[1]...
    • ...Bootstrapping-based relation extraction [1,3,4,5,6] leverage large amounts of data on the Web efficiently...
    • ...Sergey Brin propose DIPRE system [1] to extract author–book relation form the Web; The Snowball system[3] extracts entity pairs including a predefined relation from a corpus...
    • ...Furthermore, both DIPRE [1] and SatSnowball use a general form to represent extracted patterns...

    Haibo Liet al. Using Graph Based Method to Improve Bootstrapping Relation Extraction

    • ...On the other hand, patterns can be learned implicitly in an iterative process, as in DIPRE [6] and Snowball [3]...
    • ...While no formal principles exist, the informal insight of Pattern-Relation Duality (or PR Duality) has long been observed since DIPRE [6]...
    • ...Extracting tuples of a given relation from a text corpus has long been studied [6, 3, 9]. However, its dual problem of searching textual patterns only exists implicitly as an intermediate step of tuple extraction...
    • ...For problem setting, we rely on bootstrapping using seed tuples, similar to [6, 3]...
    • ...As input, like many existing pattern-based extraction efforts [6, 3], we assume a small number of seed tuples (e.g., {(Ottawa, Canada), (Beijing, China)}), and our ultimate goal is to find the matching relation (e.g., tuples for capital-city-of)...
    • ...As a concluding remark, the conceptual model PRDualRank not only exemplifies the original PR Duality in [6], but also formally quantifies and thus “rediscovers” it (as first stated in Sect...

    Yuan Fanget al. Searching patterns for relation extraction over the web: rediscovering...

Order by: