High Performance Data Mining Using the Nearest Neighbor Join

High Performance Data Mining Using the Nearest Neighbor Join,10.1109/ICDM.2002.1183884,Christian Böhm,Florian Krebs

High Performance Data Mining Using the Nearest Neighbor Join   (Citations: 22)
BibTex | RIS | RefWorks Download
The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such that the result con- tains all pairs of similar objects. Well-known are two types of the similarity join, the distance range join where the user defines a distance threshold for the join, and the closest point query or k-distance join which retrieves the k most similar pairs. In this paper, we investigate an important, third similarity join opera- tion called k-nearest neighbor join which combines each point of one point set with its k nearest neighbors in the other set. It has been shown that many standard algorithms of Knowledge Dis- covery in Databases (KDD) such as k-means and k-medoid clus- tering, nearest neighbor classification, data cleansing, postpro- cessing of sampling-based data mining etc. can be implemented on top of the k-nn join operation to achieve performance im- provements without affecting the quality of the result of these al- gorithms. We propose a new algorithm to compute the k-nearest neighbor join using the multipage index (MuX), a specialized in- dex structure for the similarity join. To reduce both CPU and I/O cost, we develop optimal loading and processing strategies.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Sort by: