Academic
Publications
Building on Quicksand

Building on Quicksand,Computing Research Repository,Pat Helland,David Campbell

Building on Quicksand   (Citations: 9)
BibTex | RIS | RefWorks Download
Reliable systems have always been built out of unreliable components. Early on, the reliable components were small such as mirrored disks or ECC (Error Correcting Codes) in core memory. These systems were designed such that failures of these small components were transparent to the application. Later, the size of the unreliable components grew larger and semantic challenges crept into the application when failures occurred. As the granularity of the unreliable component grows, the latency to communicate with a backup becomes unpalatable. This leads to a more relaxed model for fault tolerance. The primary system will acknowledge the work request and its actions without waiting to ensure that the backup is notified of the work. This improves the responsiveness of the system. There are two implications of asynchronous state capture: 1) Everything promised by the primary is probabilistic. There is always a chance that an untimely failure shortly after the promise results in a backup proceeding without knowledge of the commitment. Hence, nothing is guaranteed! 2) Applications must ensure eventual consistency. Since work may be stuck in the primary after a failure and reappear later, the processing order for work cannot be guaranteed. Platform designers are struggling to make this easier for their applications. Emerging patterns of eventual consistency and probabilistic execution may soon yield a way for applications to express requirements for a "looser" form of consistency while providing availability in the face of ever larger failures. This paper recounts portions of the evolution of these trends, attempts to show the patterns that span these changes, and talks about future directions as we continue to "build on quicksand".
Journal: Computing Research Repository - CORR , vol. abs/0909.1, 2009
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...In many cases the core logic will be trivially monotonic, but with special-purpose escapes into nonmonotonicity that should either be “protected” by coordination, or managed via compensatory exception handling (Helland and Campbell’s “apologies” [30].) As a classic example, general ledger entries (debits and credits) accumulate monotonically, but account balance computation is non-monotonic; Amazon uses a (mostly) eventually-consistent ...

    Joseph M. Hellerstein. The declarative imperative: experiences and conjectures in distributed...

    • ...Techniques like strict or strong consistency and database-style transactions do not scale at Internet level and are rarely needed in modern large-scale distributed systems anyway [3][22][12]...

    Christian Tilgner. Declarative scheduling in highly scalable systems

    • ...through RAID [37]), the contemporary model is to accept that there will be faults and minimize its impact on the user (e.g. through replication) [7,8,9]...
    • ...Whenever possible, a workflow should be idempotent [9,23]...
    • ...Whenever possible, activities within a workflow should be designed to be idempotent, whereby a re-execution of the previously faulting workflow with the same input parameters ensures that the data recovers [9]...
    • ...Much of our work builds upon well known concepts in distributed systems, fault tolerant applications and replicated databases: fail fast, statefulness, graceful degradation, idempotent tasks, asynchronous recovery, and eventual consistency [7,9,24,47,48]...

    Yogesh Simmhanet al. Building Reliable Data Pipelines for Managing Community Data UsingScie...

    • ...On the other hand, new data management trends [16,17] (e.g., in the cloud computing field) suggest that temporary inconsistencies should be afforded by modern applications, and that this should be considered a strong requirement when scalability is a must...

    Francesc D. Muñoz-escoíet al. Revising 1Copy Equivalence in Replicated Databases with Snapshot Isola...

    • ...However, many recent data management applications have a significantly increased need of scalability, while being flexible enough to partially sacrifice consistency [30, 34]...

    R. de Juanet al. A Survey of Scalability Approaches for Reliable Causal Broadcasts

Sort by: