Academic
Publications
A Forest-structured Bloom Filter with flash memory

A Forest-structured Bloom Filter with flash memory,10.1109/MSST.2011.5937232,Guanlin Lu,Biplob Debnath,David H. C. Du

A Forest-structured Bloom Filter with flash memory   (Citations: 1)
BibTex | RIS | RefWorks Download
A Bloom Filter (BF) is a data structure based on probability to compactly represent/record a set of elements (keys). It has wide applications on efficiently identifying a key that has been seen before with minimum amount of recording space used. BF is heavily used in chunking based data de-duplication. Traditionally, a BF is implemented as in-RAM data structure; hence its size is limited by the available RAM space on the machine. For certain applications like data de-duplication that require a big BF beyond the size of available RAM space, it becomes necessary to store a BF into a secondary storage device. Since BF operations are inherently random in nature, magnetic disk provides worse performance for the random read and write operations. It will not be a good fit for storing the large BF. Flash memory based Solid State Drive (SSD) has been considered as an emerging storage device that has superior performance and can potentially replace disks as the preferred secondary storage devices. However, several special characteristics of flash memory make designing a flash memory based BF very challenging. In this paper, our goal is to design an efficient flash memory based BF that is fully aware of these physical characteristics. To this end, we propose a Forest-structured BF design (FBF). FBF uses a combination of RAM and flash memory to design a BF. BF is stored on the flash, while RAM helps to mitigate the impact of slow write performance of flash memory. In addition, in-flash BF is organized in a forest-like structure in order to improve the lookup performance. Our experimental results show that FBF design achieves 2 times faster processing speed with 50% less number of flash write operations when compared with the existing flash memory based BF designs. I. INTRODUCTION A Bloom Filter (BF) is a bit vector that compactly represents a set of items (keys) and supports key query/insert operations. It can definitely tell if a key is not present, but it may not tell with guarantee that a key is indeed present. In other word, an answer given by a BF bears certain false positive rate. To keep this false positive rate low, traditional BF designs have to set Bloom Filter size a priori to be a few times larger than the maximum number of items represented. Bloom Filters are heavily used in chunking based data de-duplication. Chunking based de-duplication is an efficient technique to eliminate data redundancy within both backup data and data stored in primary storage. Traditionally, chunk- ing based de-duplication requires a chunk index that consists of each chunk's identifier (i.e., a SHA1 hashed value computed based on chunk's content) and its resided location in disk. This
Conference: Symposium on Mass Storage Systems - MSS , pp. 1-6, 2011
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...Recent deduplication design has mainly focused on its write performance by: (1) improving the duplication detection and chunk existence querying efficiency with efficient chunking, faster hash indexing [2], locality-preserving index caching [1], and efficient bloom filters [3]; (2) compressing the unique chunks and performing (fixed-size) large writes through containers [4], [1] or similar structures [5]...

    Youngjin Namet al. Chunk Fragmentation Level: An Effective Indicator for Read Performance...

Sort by: