Academic
Publications
A Heterogeneous Field Matching Method for Record Linkage

A Heterogeneous Field Matching Method for Record Linkage,10.1109/ICDM.2005.7,Steven N. Minton,Claude Nanjo,Craig A. Knoblock,Martin Michalowski,Matthe

A Heterogeneous Field Matching Method for Record Linkage   (Citations: 27)
BibTex | RIS | RefWorks Download
Record linkage is the process of determining that two records refer to the same entity. A key subprocess is eval- uating how well the individual fields, or attributes, of the records match each other. One approach to matching fields is to use hand-written domain-specific rules. This "ex- pert systems" approach may result in good performance for specific applications, but it is not scalable. This pa- per describes a new machine learning approach that creates expert-like rules for field matching. In our approach, the re- lationship between two field values is described by a set of heterogeneous transformations. Previous machine learning methods used simple models to evaluate the distance be- tween two fields. However, our approach enables more so- phisticated relationships to be modeled, which better cap- ture the complex domain specific, common-sense phenom- ena that humans use to judge similarity. We compare our approach to methods that rely on simpler homogeneous models in several domains. By modeling more complex re- lationships we produce more accurate results.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...Minton et al. [26] describe a new machine learning approach that creates expert-like rules for field matching, a key subprocess in record linkage...

    Debabrata Deyet al. Efficient Techniques for Online Record Linkage

    • ...Training-based approaches, e.g., Naïve Bayes [49], logistic regression [46], Support Vector Machine (SVM) [11,43,49] or decision trees [63,29,49,53,54,56] have so far been used for some subtasks, e.g., determining suitable parameterizations for matchers or adjusting combination functions parameters (weights for matchers, offsets)...

    Hanna Köpckeet al. Frameworks for entity matching: A comparison

    • ...Most of the existing similarity join methods belong to supervised learning methods, such as [23][24], unsupervised learning methods, such as [22][25], or semi-supervised learning methods, such as [26]...

    Bilal Hawashinet al. Diffusion Maps: A Superior Semantic Method to Improve Similarity Join ...

    • ...Recent work [4, 14, 26] has recognized that textual similarity alone is inadequate in matching strings that are syntactically far apart but still represent the same real-world object...
    • ...The notion of string transformations has been identified in recent work [4, 14, 26] as a way of overcoming this inadequacy...
    • ...Robert are used to boost the similarity between strings that are textually far apart [4, 14, 26]...

    Parag Agrawalet al. On indexing error-tolerant set containment

    • ...The problem of object consolidation is related to the problem of record linkage or deduplication [1], [2]...
    • ...In the literature, there are many object consolidation and reord linkage techniques [1], [2] that are similar to the spirit of combine operation but often they use domain specific knowledge or training data sets that make them unsuitable for autonomous integration...

    Shazzad Hosainet al. Algebraic operator support for semantic data fusion in extended SQL

Sort by: