## Keywords (3)

Academic
Publications
A Theory for Record Linkage

# A Theory for Record Linkage,10.1080/01621459.1969.10501049,Journal of The American Statistical Association,Ivan P. Fellegi,Alan B. Sunter

A Theory for Record Linkage
 BibTex | RIS | RefWorks Download
A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events (said to be matched).A comparison is to be made between the recorded characteristics and values in two records (one from each file) and a decision made as to whether or not the members of the comparison-pair represent the same person or event, or whether there is insufficient evidence to justify either of these decisions at stipulated levels of error. These three decisions are referred to as link (A1), a non-link (A3), and a possible link (A2). The first two decisions are called positive dispositions.The two types of error are defined as the error of the decision A1 when the members of the comparison pair are in fact unmatched, and the error of the decision A3 when the members of the comparison pair are, in fact matched. The probabilities of these errors are defined asandrespectively where u(γ), m(γ) are the probabilities of realizing γ (a comparison vector whose components are the coded agreements and disagreements on each characteristic) for unmatched and matched record pairs respectively. The summation is over the whole comparison space r of possible realizations.A linkage rule assigns probabilities P(A1|γ), and P(A2|γ), and P(A3|γ) to each possible realization of γ ε Γ. An optimal linkage rule L (μ, λ, Γ) is defined for each value of (μ, λ) as the rule that minimizes P(A2) at those error levels. In other words, for fixed levels of error, the rule minimizes the probability of failing to make positive dispositions.A theorem describing the construction and properties of the optimal linkage rule and two corollaries to the theorem which make it a practical working tool are given.
Journal: Journal of The American Statistical Association - J AMER STATIST ASSN , vol. 64, no. 328, pp. 1183-1210, 1969
View Publication
 The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
 ( dx.doi.org )

## Citation Context (356)

• ...There has been much work in the database and data mining communities on matching structured records to other structured records, including record linkage [8, 19, 20, 25], entity resolution [3, 23], and duplicate detection [7, 22]...

### Rakesh Agrawal, et al. Aggregating web offers to determine product prices

• ...A natural way of achieving this is to consider decision rules akin to those in record linkage problems (Fellegi and Sunter, 1969), eg, a sample unique is classified as a population unique (a positive) when
Daniel Manrique-Vallieret al. Estimating Identification Disclosure Risk Using Mixed Membership Model...
• ...Geographic area is represented by the province of residence while the probabilistic decision model follows the Fellegi and Sunter approach 6...

### Irene Rocchetti, et al. Modeling delay in diagnosis of NF: under reportincg, incidence and pre...

• ...While early work on EM dates back to the 1960s [18, 9], the problem has recently received renewed interest due mainly to the growing popularity of data exchange and sharing (especially, in the science community), and advances in technologies for web-scale information extraction and management...
• ...Conventional approaches to EM (dating back to the seminal work of Newcombe [18] and Fellegi and Sunter [9]) have focused on discovering independent pair-wise matches of entities using a variety of attribute-similarity measures (based, for instance, on approximate string matching)...
• ...Newcombe [18] and Fellegi and Sunter [9], gave the problem a probabilistic foundation by posing EM as a classification problem (i.e., deciding a pair to be a match or a non-match) based on attribute-similarity scores...

### Vibhor Rastogi, et al. Large-Scale Collective Entity Matching

• ...The problem of matching records has been studied under various topics including record linkage [2, 3, 4, 5], duplicate detection [6, 7], entity resolution [8, 9, 10], and merge/purge [11].,For instance, while the work of Newcombe [4] (later formalized by Fellegi and Sunter in [3]) pioneered the probabilistic approach to matching, their work (and much of the subsequent record linkage literature) tacitly assumes that the data to be matched consists of properly structured records with a well-defined schema...

Sort by: