Academic
Publications
Adaptive Filtering for Efficient Record Linkage

Adaptive Filtering for Efficient Record Linkage,Lifang Gu,Rohan A. Baxter

Adaptive Filtering for Efficient Record Linkage   (Citations: 20)
BibTex | RIS | RefWorks Download
The process of identifying record pairs that represent the same real-world entity in multiple databases, com- monly known as record linkage, is one of the important initial steps in many data mining applications. Record linkage of millions of records is a computationally ex- pensive task. Various blocking methods have been used in record linkage systems to reduce the number of record pairs for comparison. A good blocking key is critical to the success of a blocking method and will ideally result in lot of small blocks. However, in practice, there are almost always large blocks no matter how good the blocking key is. For example, when blocking on surname for an Anglo- Celtic population, 'Smith' and 'Taylor' are populous and result in very large block sizes. The eciency of a blocking method is hindered by these large blocks since the resulting number of record pairs is dominated by the sizes of these large blocks. In this paper, we present an adaptive filtering algorithm to post-process large blocks to enhance the blocking eciency. Experimental results show that our filtering algo- rithm can reduce the number of record pairs produced by the standard blocking method by 88% on a small real-world data set. The algorithm also reduces the number of record pairs generated by a 3-pass standard blocking method by 50% on several synthetic test data sets, with minimal loss of accuracy.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...More specifically, we perform numerical tests using the Freely Extensible Biomedical Record Linkage (FEBRL) database—it is a large data repository containing synthetic census data and has been used in prior studies in record linkage [9], [18]...

    Debabrata Deyet al. Efficient Techniques for Online Record Linkage

    • ...Various blocking techniques [1], [2], [3], [4], [5], [6], [7], [8], [9] have been proposed to make ER scalable...
    • ...Various Blocking techniques focusing on accuracy ([1], [2], [3], [4], [5]) and performance ([6], [7], [8], [9]) have been proposed to enhance the Entity Resolution process...

    Steven Euijong Whanget al. Entity resolution with iterative blocking

    • ...In practise, blocking [3, 20, 49], sorting [27], filtering [24], clustering [32], or indexing [3, 10] techniques are used to reduce the number of record pair comparisons (as discussed in Sect...
    • ... these techniques aim at cheaply removing as many record pairs as possible from the set of non-matches U that are obvious non-matches, without removing any pairs from the set of matches M . Two complexity measures that quantify the efficiency and quality of such blocking methods have recently been proposed [18] (citations given refer to data linkage or deduplication publications that have used these measures): • Reduction ratio [3, 18, 24] ...
    • ...• Pairs completeness [3, 18, 24] is measured as pc = N...

    Peter Christenet al. Quality and Complexity Measures for Data Linkage and Deduplication

    • ...In the final comparison, we use synthetic Census data, called “dataset4” from (Gu & Baxter 2004)...
    • ...Table 6 also compares the adaptive filtering method (Gu & Baxter 2004) to our method...
    • ...As already stated, (Baxter, Christen, & Churches 2003), created the bi-gram indexing method and Gu and Baxter (Gu & Baxter 2004) provided method refinements...

    Matthew Michelsonet al. Learning Blocking Schemes for Record Linkage

    • ...A number of blocking methods have been proposed by researchers for speeding up record linkage and clustering [15, 22, 30, 20, 18, 26, 1, 6, 21, 17, 41]...

    Mikhail Bilenkoet al. Adaptive Blocking: Learning to Scale Up Record Linkage

Sort by: