Academic
Publications
Whither Generic Recovery from Application Faults? A Fault Study using Open-Source Software

Whither Generic Recovery from Application Faults? A Fault Study using Open-Source Software,10.1109/ICDSN.2000.857521,Subhachandra Chandra,Peter M. Che

Whither Generic Recovery from Application Faults? A Fault Study using Open-Source Software   (Citations: 42)
BibTex | RIS | RefWorks Download
This paper tests the hypothesis that generic recovery techniques, such as process pairs, can survive most appli- cation faults without using application-specific informa- tion. We examine in detail the faults that occur in three, large, open-source applications: the Apache web server, the GNOME desktop environment, and the MySQL data- base. Using information contained in the bug reports and source code, we classify faults based on how they depend on the operating environment. We find that 72-87% of the faults are independent of the operating environment and are hence deterministic (non-transient). Recovering from the failures caused by these faults requires the use of application-specific knowledge. Half of the remaining faults depend on a condition in the operating environment that is likely to persist on retry, and the failures caused by these faults are also likely to require application-specific recovery. Unfortunately, only 5-14% of the faults were triggered by transient conditions, such as timing and syn- chronization, that naturally fixthemselves during recovery. Our results indicate that classical application-generic recovery techniques, such as process pairs, will not be suf- ficient to enable applications to survive most failures caused by application faults.
Conference: Dependable Systems and Networks - DSN , pp. 97-106, 2000
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...Chandra and Chen [1] collect 12 concurrency bugs from three applications...

    John Marreroet al. How do programs become more concurrent: a story of program transformat...

    • ...This happens quite frequently when failures are due either to integration problems or to rare and faulty event sequences that passed unit testing [24], [25], [26], [27]...

    Leonardo Marianiet al. Dynamic Analysis for Diagnosing Integration Faults

    • ...While a few studies of concur­ rency bugs exist [11 ,14,22], they either focus on artificially injected bugs, or, in the few cases where real applications were studied, they mostly focus on the causes ofthese bugs, and limit the study of their effects to whether they cause deadlocks or not...
    • ...After filtering, we obtained a final set with 80 concurrency bugs that were an­ alyzed, a number that is very close (or even superior) to the number of bugs analyzed in previous studies [11,22]...
    • ...The number of bugs analyzed in this study is compara­ ble to the number of bugs analyzed in other related stud­ ies [11,22,32]...
    • ...Chandra et al. [11] looked at bug databases of three open-...

    Pedro Fonsecaet al. A study of the internal and external effects of concurrency bugs

    • ...It is worth noting that Mandelbugs account for a significant part of failures in the operational phase, up to 82% in well-tested critical software [2], [5], [6]...
    • ...From past works on field data [2], [3], [6], [29], we identified the following transient fault triggers: concurrency; timing of external events; wrong memory state; faulty error handling routines (the fault is triggered by another one); complex input sequences; software aging (e.g., resource leaks)...

    Roberto Natellaet al. Emulation of Transient Software Faults for Dependability Assessment: A...

    • ...The relation between complexity and reliability in traditional operating systems has been well studied [2], [3], [4], [5], [6]...
    • ...Reliability literature over the years contains the results of many research efforts directed at analyzing bug reports for popular operating systems [2], [3], [4], [5], [6]...
    • ...In another work on software reliability, Chandra et al. [6] contradicted a popular belief that most application faults can be tolerated using generic techniques such as process pairs...

    Amiya Kumar Majiet al. Characterizing Failures in Mobile OSes: A Case Study with Android and ...

Sort by: