Academic
Publications
In Search of an Accuracy Metric (Data and Information Quality Metrics)
In Search of an Accuracy Metric (Data and Information Quality Metrics)  
BibTex | RIS | RefWorks Download
Practitioners and researchers often refer to error rates or accuracy percentages of databases. The former is the number of cells in error divided by the total number of cells; the latter is the number of correct cells divided by the total number of cells. However, databases may have similar error rates (or accuracy percentages) but differ drastically in the severity of their accuracy problems. A simple percent does not provide information as to whether the errors are systematic such as one record with 20 fields in error or 20 errors randomly distributed throughout the database. The difference is rooted in the degree of randomness or complexity. We expand the accuracy metric to include a complexity (randomness) measure and include a probability distribution value. The proposed randomness check is based on the Lempel-Ziv (LZ) complexity measure. The main candidate for the probability distribution parameter is Poisson's lambda. The newly described metric allows management to distinguish between databases that have similar accuracy measures and error rates but differ drastically in the level of complexity of the quality problems.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.