Author
|
Conference
|
Journal
|
Organization
|
Year
|
DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all domains
Limit my searches in the following domains
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(8)
Database System
Discrete Distribution
Goal Orientation
Probabilistic Database
Query Processing
Sampling Strategy
Selective Constraint
Symbolic Representation
Subscribe
Academic
Publications
PIP: A database system for great and small expectations
Edit
PIP: A database system for great and small expectations
(
Citations: 3
)
BibTex
|
RIS
|
RefWorks
Download
Oliver Kennedy
,
Christoph Koch
Estimation via sampling out of highly selective join queries is well known to be problematic, most notably in online aggregation. Without goal-directed sampling strategies, samples falling outside of the selection constraints lower estimation efficiency at best, and cause inaccurate estimates at worst. This problem appears in general
probabilistic database
systems, where
query processing
is tightly coupled with sampling. By committing to a set of samples before evaluating the query, the engine wastes effort on samples that will be discarded,
query processing
that may need to be repeated, or unnecessarily large numbers of samples. We describe PIP, a general
probabilistic database
system that uses symbolic representations of probabilistic data to defer computation of expectations, moments, and other statistical measures until the expression to be measured is fully known. This approach is sufficiently general to admit both continuous and discrete distributions. Moreover, deferring sampling enables a broad range of goal-oriented sampling-based (as well as exact) integration techniques for computing expectations, allows the selection of the integration strategy most appropriate to the expression being measured, and can reduce the amount of sampling work required. We demonstrate the effectiveness of this approach by showing that even straightforward algorithms can make use of the added information. These algorithms have a profoundly positive impact on the efficiency and accuracy of expectation computations, particularly in the case of highly selective join queries.
Conference:
International Conference on Data Engineering - ICDE
, pp. 157-168, 2010
DOI:
10.1109/ICDE.2010.5447879
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
www.informatik.uni-trier.de
)
(
www.cs.cornell.edu
)
(
dx.doi.org
)
(
ieeexplore.ieee.org
)
(
ieeexplore.ieee.org
)
More »
Citation Context
(2)
...There is also recent interest in applying Monte Carlo algorithms for managing uncertain data [21, 11, 15, 26, 25,
17
]...
Tingjian Ge
,
et al.
Monte Carlo query processing of uncertain multidimensional array data
...There has been substantial work on uncertain data management lately (e.g., [10, 28, 24, 16,
17
])...
Tingjian Ge
.
Join queries on uncertain data: Semantics and efficient processing
References
(20)
Fast and Simple Relational Processing of Uncertain Data
(
Citations: 50
)
Lyublena Antova
,
Thomas Jansen
,
Christoph Koch
,
Dan Olteanu
Conference:
International Conference on Data Engineering - ICDE
, pp. 983-992, 2008
Evaluating probabilistic queries over imprecise data
(
Citations: 306
)
Reynold Cheng
,
Dmitri V. Kalashnikov
,
Sunil Prabhakar
Conference:
International Conference on Management of Data - SIGMOD
, pp. 551-562, 2003
Efficient query evaluation on probabilistic databases
(
Citations: 94
)
Nilesh N. Dalvi
,
Dan Suciu
Journal:
The Vldb Journal - VLDB
, vol. 16, no. 4, pp. 523-544, 2007
MauveDB: supporting model-based user views in database systems
(
Citations: 74
)
Amol Deshpande
,
Samuel Madden
Conference:
International Conference on Management of Data - SIGMOD
, pp. 73-84, 2006
Models for Incomplete and Probabilistic Information
(
Citations: 78
)
Todd J. Greenand
,
Val Tannen
Journal:
IEEE Data(base) Engineering Bulletin - DEBU
, vol. 29, no. 1, pp. 278-296, 2006
Order by:
Citations
(3)
Monte Carlo query processing of uncertain multidimensional array data
Tingjian Ge
,
David Grabiner
Conference:
International Conference on Data Engineering - ICDE
, pp. 936-947, 2011
Join queries on uncertain data: Semantics and efficient processing
Tingjian Ge
Conference:
International Conference on Data Engineering - ICDE
, pp. 697-708, 2011
MCDB-R: Risk Analysis in the Database
(
Citations: 1
)
Peter J. Haas
,
Christopher M. Jermaine
,
Subi Arumugam
,
Fei Xu
,
Luis Leopoldo Perez
,
Ravi Jampani
Journal:
Proceedings of The Vldb Endowment - PVLDB
, vol. 3, no. 1, pp. 782-793, 2010