Bayesian approach for near-duplicate image detection
Vote-based algorithms are very popular in tasks based on image local-descriptors, including object matching, panoramic stitching and near-duplicate detection. On this paper, we focus on the latter application, proposing a Bayesian approach, which allows giving a probabilistic interpretation to the distances between local descriptors in the feature space. That contrasts with traditional schemes, in which the distances are used to establish a simple unweighted vote count. Near-duplicate detection is demanded for a myriad of applications: metadata retrieval in cultural institutions, detection of copyright violations, duplicate elimination in storage, etc. The majority of current solutions are based either on voting algorithms, which are very precise, but expensive; or on the use of visual dictionaries, which are efficient, but less precise. Contrarily to raw-vote based systems, our scheme performs few database accesses; and contrarily to dictionary-based systems, it allows a fine control of the compromise between precision and efficiency. In our experiments, it yields 99% accuracy with less than 10 database accesses, in contrast with the hundreds needed in raw-voting schemes.
Published in 2012.