The Google file system

The Google file system,10.1145/945445.945450,Sanjay Ghemawat,Howard Gobioff,Shun-Tak Leung

The Google file system   (Citations: 1090)
BibTex | RIS | RefWorks Download
We have designed and implemented the Google File Sys- tem, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous dis- tributed file systems, our design has been driven by obser- vations of our application workloads and technological envi- ronment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore rad- ically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our ser- vice as well as research and development efforts that require large data sets. The largest cluster to date provides hun- dreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
More »
    • ...The input data set is stored in a distributed file system implemented on top of a key-value store running on CamCube, as with GFS [25] or HDFS [1]...

    Paolo Costaet al. Camdoop: Exploiting In-network Aggregation for Big Data Applications

    • ...Current studies, like RAMCloud, realize fast failure recovery by using aggressive data partitioning [15, 23], a distributed approach that scatters backup data across hundreds or thousands of disks on backup servers, and quickly reconstructs lost data in the RAM of hundreds of servers.,RAMCloud realizes fast server failure recovery by using aggressive data partitioning [15, 23]...

    Yiming Zhanget al. RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store

    • ...Many systems solve this problem using a metadata server that stores the location of data blocks [14, 30].,Systems such as GFS [14] require larger chunks partially to reduce load on the metadata server.,Weak consistency guarantees are not uncommon; for example, clients of the Google File System [14] must handle garbage entries in files.,The Google File System has a centralized master that keeps all metadata in memory [14]...

    Edmund B. Nightingaleet al. Flat Datacenter Storage

    • ...Similar to existing batch-oriented data storage and processing systems [16, 19], Sonora stores data reliably and scales to include more machines as the number of users and the rate of incoming data increase...

    Fan Yanget al. Sonora: A Platform for Continuous Mobile-Cloud Computing

    • ...Interestingly, this property is preserved by many other primary/backup systems, such as Chubby [5], GFS [8], Boxwood [19], PacificA [21] and Chain-Replication [20] (see Section 6). These systems, however, resort to an external service for reconfiguration.,Primary order is commonly guaranteed by Primary/Backup replication systems, e.g., Chubby [5], GFS [8], Boxwood [19], PacificA [21], chain replication [20], Harp [17] and Echo [11].,GFS [8], Chubby [5], chain replication [20] and PacificA [21] that use an external reconfiguration service, we use the system itself as the reconfiguration engine, exploiting the primary order property to streamline reconfigurations with other operations...

    Alexander Shraeret al. Dynamic Reconfiguration of Primary/Backup Clusters

Sort by: