Sign in
Author
|
Conference
|
Journal
|
Organization
|
Year
|
DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all fields of study
Limit my searches in the following fields of study
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(7)
Case Study
Mean Time To Failure
Operating System
Research Agenda
Mean Time To Repair
Recovery Oriented Computing
Total Cost of Ownership
Related Publications
(21)
Distributed Reset
To Err is Human
Architecture and Dependability of Large-Scale Internet Services
Seneca: remote mirroring done write
The cost of lost data
Subscribe
Academic
Publications
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies,David Patterson,Aaron Brown,Pete Broadwell,George Candea,Mike
Edit
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies
(
Citations: 185
)
BibTex
|
RIS
|
RefWorks
Download
David Patterson
,
Aaron Brown
,
Pete Broadwell
,
George Candea
,
Mike Chen
,
James Cutler
,
Patricia Enriquez
,
Armando Fox
,
Emre Kiciman
,
Matthew Merzbacher
,
David Oppenheimer
,
Naveen Sastry
http://academic.research.microsoft.com/io.ashx?type=5&id=80876&selfId1=0&selfId2=0&maxNumber=12&query=
It is time to broaden our performance-dominated research agenda. A four order of magnitude increase in performance since the first ASPLOS in 1982 means that few outside the CS&E research community believe that speed is the only problem of computer hardware and software. Current systems crash and freeze so frequently that people become violent.1 Fast but flaky should not be our 21st century legacy.
Recovery Oriented Computing
(ROC) takes the perspective that hardware faults, software bugs, and operator errors are facts to be coped with, not problems to be solved. By concentrating on
Mean Time to Repair
(MTTR) rather than
Mean Time to Failure
(MTTF), ROC reduces recovery time and thus offers higher availability. Since a large portion of system administration is dealing with failures, ROC may also reduce total cost of ownership. One to two orders of magnitude reduction in cost mean that the purchase price of hardware and software is now a small part of the total cost of ownership. In addition to giving the motivation and definition of ROC, we introduce failure data for Internet sites that shows that the leading cause of outages is operator error. We also demonstrate five ROC techniques in five case studies, which we hope will influence designers of architectures and operating systems. If we embrace availability and maintainability, systems of the future may compete on recovery performance rather than just SPEC performance, and on
total cost of ownership
rather than just system price. Such a change may restore our pride in the architectures and operating systems we craft.
Published in 2002.
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
www.ece.rutgers.edu
)
(
wwwse.inf.tu-dresden.de
)
(
ssdl.stanford.edu
)
(
pompone.cs.ucsb.edu
)
(
www.ssrc.ucsc.edu
)
(
se.inf.tu-dresden.de
)
(
research.microsoft.com
)
(
naveen.ksastry.com
)
(
research.microsoft.com
)
(
roc.cs.berkeley.edu
)
More »
Citation Context
(105)
...The Recovery-Oriented Computing project has explored integrating anundo operationinto software systems[
36
] and constructing systems out of a set of individually rebootable components [10]...
Brian Demsky
,
et al.
Bristlecone: Language Support for Robust Software Applications
...While failure recovery has been previously studied in various fields including operating systems, databases, and internet services [2], [24], [
25
], these studies are either specific to a particular problem domain or hardly applicable to improve checkpoint-based restart...
...A representative work is the ROC project from Berkeley and Stanford [
25
]...
Yawei Li
,
et al.
FREM: A Fast Restart Mechanism for General Checkpoint/Restart
...Since such service disruptions are ill affordable for many mission-critical systems, such as air control systems, credit card authorization, and brokerage operations [
4
], these systems demand highly dependable services and require services to be available 24X7...
Haibo Chen
,
et al.
Dynamic Software Updating Using a Relaxed Consistency Model
...[
6
] Integrated diagnostic support, system-wide undo support, redundancy and isolation are the characteristics of recovery oriented computing...
...[
6
] ROC is dependent on the participation from industry & it needs benchmarks that are realistic and based on real world environment...
Sumreena Bano
,
et al.
Computing technology: Diversities and difficulties
...As per the Recovery Oriented Computing (ROC) [
1
] perspective the hardware faults, software bugs and operator errors are facts to be coped with, not problems to be solved...
Sudhanshu Shekhar Jha
,
et al.
Minimizing Restart Time for Fast Rejuvenation and Availability Enhance...
References
(76)
Computer architecture: a quantitative approach
(
Citations: 1560
)
J. Hennessy
,
D. Patterson
Journal:
ACM Transactions on Programming Languages and Systems - TOPLAS
, 1998
When Virtual is Better than Real
(
Citations: 153
)
Peter M. Chen
,
Brian D. Noble
Conference:
Workshop on Workstation Operating Systems (now HotOS)/Workshop on Hot Topics in Operating Systems - HotOS(WWOS)
, pp. 133-138, 2001
Virtual Services: A New Abstraction for Server Consolidation
(
Citations: 44
)
John Reumann
,
Ashish Mehra
,
Kang G. Shin
,
Dilip D. Kandlur
Conference:
USENIX Technical Conference - USENIX
, pp. 117-130, 2000
INTEGRATED EVENT MANAGEMENT: EVENT CORRELATION USING DEPENDENCY GRAPHS
(
Citations: 86
)
Boris Gruschke
Conference:
Distributed Systems, Operations and Management - DSOM
, 1998
Algorithms for Clustering Data
(
Citations: 4196
)
Anil K. Jain
,
Richard C. Dubes
Published in 1988.
Sort by:
Citations
(185)
Bristlecone: Language Support for Robust Software Applications
(
Citations: 2
)
Brian Demsky
,
Sivaji R. Sundaramurthy
Journal:
IEEE Transactions on Software Engineering - TSE
, vol. 37, no. 1, pp. 4-23, 2011
FREM: A Fast Restart Mechanism for General Checkpoint/Restart
(
Citations: 1
)
Yawei Li
,
Zhiling Lan
Journal:
IEEE Transactions on Computers - TC
, vol. 60, no. 5, pp. 639-652, 2011
Dynamic Software Updating Using a Relaxed Consistency Model
Haibo Chen
,
Jie Yu
,
Chengqun Hang
,
Binyu Zang
,
Pen-Chung Yew
Journal:
IEEE Transactions on Software Engineering - TSE
, vol. 37, no. 5, pp. 679-694, 2011
Computing technology: Diversities and difficulties
Sumreena Bano
,
Syed Faisal Ahmed Bukhari
,
Muhammad Asad Khan Niazi
,
Sharraf Hussain
Conference:
International Conference on Computer Research and Development - ICCRD
, 2011
Minimizing Restart Time for Fast Rejuvenation and Availability Enhancement
Sudhanshu Shekhar Jha
,
Adrian Jonel Krdu
,
Miroslaw Malek
Conference:
International Symposium on Autonomous Decentralized Systems - ISADS
, 2011