Academic
Publications
Cross-Language High Similarity Search: Why No Sublinear Time Bound Can Be Expected

Cross-Language High Similarity Search: Why No Sublinear Time Bound Can Be Expected,10.1007/978-3-642-12275-0_66,Maik Anderka,Benno Stein,Martin Pottha

Cross-Language High Similarity Search: Why No Sublinear Time Bound Can Be Expected  
BibTex | RIS | RefWorks Download
This paper contributes to an important variant of cross-language in- formation retrieval, called cross-language high similari ty search. Given a collec- tion D of documents and a query q in a language different from the language of D, the task is to retrieve highly similar documents with respe ct to q. Use cases for this task include cross-language plagiarism detection and translation search. The current line of research in cross-language high similarity search resorts to the comparison of q and the documents in D in a multilingual concept space—which, however, requires a linear scan of D. Monolingual high similarity search can be tackled in sub-linear time, either by fingerprinting or by "b rute force n-gram in- dexing", as it is done by Web search engines. We argue that neither fingerprinting nor brute force n-gram indexing can be applied to tackle cross-language high similarity search, and that a linear scan is inevitable. Our findings are based on theoretical and empirical insights.
Conference: European Colloquium on IR Research - ECIR , pp. 640-644, 2010
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.