Academic
Publications
Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis

Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis,Evgeniy Gabrilovich,Shaul Markovitch

Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis   (Citations: 274)
BibTex | RIS | RefWorks Download
Computing semantic relatedness of natural lan- guage texts requires access to vast amounts of common-sense and domain-specific world knowl- edge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the mean- ing of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in corre- lation of computed relatedness scores with human judgments: from r =0 .56 to 0.75 for individual words and from r =0 .60 to 0.72 for texts. Impor- tantly, due to the use of natural concepts, the ESA model is easy to explain to human users. We propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic representation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of natural concepts de- rived from Wikipedia (http://en.wikipedia.org), the largest encyclopedia in existence. We employ text classi- fication techniques that allow us to explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on automatically computing the degree of semantic relatedness between frag- ments of natural language text. The contributions of this paper are threefold. First, we present Explicit Semantic Analysis, a new approach to rep- resenting semantics of natural language texts using natural concepts. Second, we propose a uniform way for computing relatedness of both individual words and arbitrarily long text fragments. Finally, the results of using ESA for computing semantic relatedness of texts are superior to the existing state of the art. Moreover, using Wikipedia-based concepts makes our model easy to interpret, as we illustrate with a number of examples in what follows.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...This form of LSA is similar to the use of Wikipedia in (Gabrilovich and Markovitch, 2007)...

    Scott Yihet al. Polarity Inducing Latent Semantic Analysis

    • ...This is similar to recent semantics approaches such as ESA [16]...
    • ...Vector space models based on this ontology have been used by many works for semantic relatedness [16, 32]...
    • ...In common semantic representations (such as ESA [16]) a word is represented as a weighted vector of concepts (derived from Wikipedia articles)...
    • ...Methods compared: We compare our algorithm and representations to the state of the art semantic representation | Explicit Semantic Analysis (ESA), which has been shown to be signicantly superior to other approaches [16]...
    • ...This dataset, to the best of our knowledge, is the largest publicly available collection of this kind, which most prior works [16, 37, 36, 35] use in their evaluation...
    • ...with humans ESA-Wikipedia [16] 0.75 ESA-ODP [16] 0.65...
    • ...ESA-Wikipedia [16] 0.75 ESA-ODP [16] 0.65 TSA (Section 3) 0.80...
    • ...with humans ESA-Wikipedia [16] 0.59 TSA (Section 3) 0.63...
    • ...More closely related to our work, Gabrilovich et al. [16] presented an approach to WS that relied on exploiting Wikipedia for \Explicit Semantic Analysis" or ESA, and have demonstrated high correlation with human annotators...

    Kira Radinskyet al. A word at a time: computing word relatedness using temporal semantic a...

    • ...Distributional relatedness measures [11] meet the above requirements but demand the processing of large Web corpora...

    Andre Freitaset al. Querying Linked Data using Semantic Relatedness: A Vocabulary Independ...

    • ...A number of different methods have been devised to use Wikipedia for this purpose, including WikiRelate! [20], Explicit Semantic Analysis (ESA) [7], and Wikipedia Link-based Measure (WLM) [13]...

    Enamul Hoqueet al. Conceptual Query Expansion and Visual Search Results Exploration for W...

    • ...The CL-ESA model is an extension of the explicit semantic analysis model (Gabrilovich and Markovitch 2007; Potthast et al. 2008; Yang et al. 1998)...

    Martin Potthastet al. Cross-language plagiarism detection

Sort by: