Zerber+R: top-k retrieval from a confidential index
Privacy-preserving document exchange among collaboration groups in an enterprise as well as across enterprises requires techniques for sharing and search of access-controlled information through largely untrusted servers. In these settings search systems need to provide confidentiality guarantees for shared information while offering IR properties comparable to the ordinary search engines. Top-k is a standard IR technique which enables fast query execution on very large indexes and makes systems highly scalable. However, indexing access-controlled information for top-k retrieval is a challenging task due to the sensitivity of the term statistics used for ranking. In this paper we present Zerber+R - a ranking model which allows for privacy-preserving top-k retrieval from an outsourced inverted index. We propose a relevance score transformation function which makes relevance scores of different terms indistinguishable, such that even if stored on an untrusted server they do not reveal information about the indexed data. Experiments on two real-world data sets show that Zerber+R makes economical usage of bandwidth and offers retrieval properties comparable with an ordinary inverted index.