Academic
Publications
Glossary extraction and utilization in the information search and delivery system for IBM Technical Support

Glossary extraction and utilization in the information search and delivery system for IBM Technical Support,10.1147/sj.433.0546,Ibm Systems Journal,Le

Glossary extraction and utilization in the information search and delivery system for IBM Technical Support   (Citations: 9)
BibTex | RIS | RefWorks Download
In this paper we describe the practical aspects of extracting and using a glossary for a selected technical domain. We first describe the existing glossary extraction process, as applied to general corpora, and examine its shortcomings in the technical support domain. Then we propose a number of enhancements to it, including focusing the glossary on a selected domain context, providing support for multidomain glossaries, and importing domain-specific dictionaries. We apply our focused-glossary approach to the IBM Technical Support corpus and incorporate resulting glossaries within the information search and delivery system used by IBM Technical Support. We demonstrate the effectiveness of our approach by evaluating the quality of keywords and terms extracted from sample documents with the help of these glossaries.
Journal: Ibm Systems Journal - IBMSJ , vol. 43, no. 3, pp. 546-563, 2004
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ... circumscribed to frequencybased approaches and the use of reference corpora: the classic TFIDF used in (Evans & Lefferts 1995; Medelyan & Witten 2006); the notion of ‘ weirdness’as introduced in (Ahmad, Gillam, & Tostevin 1999), which compares the term frequency in the corpus with its frequency in a reference corpus from a different domain; and measures such as ‘ domain pertinence’in (Sclano & Velardi 2007) and ‘ domain spe ...
    • ... from a different domain; and measures such as ‘ domain pertinence’in (Sclano & Velardi 2007) and ‘ domain specificity’in (Kozakov et al. 2004; Park, Byrd, & Boguraev 2002), which extend and revise ‘ weirdness.’ The trend in recent research is to use hybrid approaches, in which ‘ unithood’and ‘ termhood’are combined to produce an unified indicator, such as ‘ C-value’ (Frantzi & Ananiadou 1999), and many others (Fahmi, Bouma, & van ...
    • ...To our best knowledge, the methods presented by (Ahmad, Gillam, & Tostevin 1999; Frantzi & Ananiadou 1999; Kozakov et al. 2004; Park, Byrd, & Boguraev 2002; Sclano & Velardi 2007) are capable of recognising both single- and multi-word terms and do not apply frequency thresholds...
    • ...We implemented and compared five algorithms: TF-IDF (as a baseline), ‘ weirdness’(Ahmad, Gillam, & Tostevin 1999), ‘ C-value’(Frantzi & Ananiadou 1999), ‘ Glossex’(Kozakov et al. 2004; Park, Byrd, & Boguraev 2002) and ‘ TermExtractor’(Termex) (Sclano & Velardi 2007)...

    Ziqi Zhanget al. A Comparative Evaluation of Term Recognition Algorithms

    • ...Several methods to automatically extract technical terms from domain-specific document ware-houses have been described in the literature e.g, [4,5,6,7,8]...
    • ...To the best of our knowledge, the only terminology learning application with comparable complexity of the extraction algorithms is the IBM Glossex system ([7,8])...
    • ...In [7,8], as in virtually all papers on terminology extraction [4,5,6], the validation is conducted manually by three judges (usually the authors themselves)...

    F. Sclanoet al. TermExtractor: a Web Application to Learn the Shared Terminology of Em...

    • ...The results of ATR have also been successfully applied in information retrieval, machine translation and many other domains [12, 16, 6]...
    • ...Glossex Method [12] is based on two heuristics...

    Petr Knothet al. Towards a Framework for Comparing Automatic Term Recognition Methods

    • ...Glossary building is often considered as an extension of term extraction, many systems start with the identification of relevant term in a certain domain [7] and then try to apply different techniques in order to build glossary for that specific domain...
    • ...The use of information about the domain is also used in [7] in order to improve results...

    Rosa Del Gaudioet al. Supporting e-learning with automatic glossary extraction

Sort by: