Keywords
(11)
Complex Objects
Data Cleansing
Data Mining
High Performance
Indexation
K Nearest Neighbor
Nearest Neighbor Classification
Similarity Join
Similarity Search
K Means
Nearest Neighbor
High Performance Data Mining Using the Nearest Neighbor Join
High Performance Data Mining Using the Nearest Neighbor Join
(
Citations: 22
)
Christian Böhm
,
Florian Krebs
The
similarity join
has become an important database primitive to support
similarity search
and data mining. A
similarity join
combines two sets of
complex objects
such that the result con tains all pairs of similar objects. Wellknown are two types of the similarity join, the distance range join where the user defines a distance threshold for the join, and the closest point query or kdistance join which retrieves the k most similar pairs. In this paper, we investigate an important, third
similarity join
opera tion called knearest neighbor join which combines each point of one point set with its k nearest neighbors in the other set. It has been shown that many standard algorithms of Knowledge Dis covery in Databases (KDD) such as kmeans and kmedoid clus tering,
nearest neighbor
classification, data cleansing, postpro cessing of samplingbased
data mining
etc. can be implemented on top of the knn join operation to achieve performance im provements without affecting the quality of the result of these al gorithms. We propose a new algorithm to compute the knearest neighbor join using the multipage index (MuX), a specialized in dex structure for the similarity join. To reduce both CPU and I/O cost, we develop optimal loading and processing strategies.
Conference:
IEEE International Conference on Data Mining  ICDM
, pp. 4350, 2002
DOI:
10.1109/ICDM.2002.1183884
Citation Context
(15)
...A more general version is the kNNJoin problem [
7
], [8], [11], [31], [33]: Given a data set P and a query set Q, for each point q ∈ Q we would like to retrieve its k nearest neighbors from points in P...
...Finally, the kNNJoin has also been studied [
7
], [8], [11], [31], [33]...
Bin Yao
,
et al.
K nearest neighbor queries and kNNJoins in large relational databases...
...Such nearest neighbor and a distancebased join operations have been used as a basic and underlying operation in many data mining applications, multimedia and spatial GIS databases, online decision support, and Internet search applications [12], [
13
], [14], [15]...
You Jung Kim
,
et al.
Performance Comparison of the R*Tree and the Quadtree for kNN and Dist...
...Nearest Neighbor Search (NNS) is an important technique in a variety of applications including pattern recognition [6], vision [13], or data mining [
1
,5]...
Eva GómezBallester
,
et al.
Combining Elimination Rules in TreeBased Nearest Neighbor Search Algo...
...Bohm and Krebs [
3
] discuss the knearest neighbor join, which associates two sets of spatial data objects DA and DB and a cardinality threshold k; the output is a set of pairs from DA and DB that include, for each data object from DA, its k NNs in DB. Shou et al. [37] study the iceberg distance join where, given two spatial data sets DA and DB, a distance threshold � , and a cardinality threshold k, the target is to retrieve all pairs of ...
Yunjun Gao
,
et al.
OptimalLocationSelection Query Processing in Spatial Databases
...The clustering model was based on link analysis over the customers’ relationship records, followed by a clustering approach using the nearest neighbor technique [
6
]...
Carlos André Reis Pinheiro
,
et al.
Customer's Relationship Segmentation Driving the Predictive Modeling f...
