Cristian Ungureanu, Benjamin Atkin, Akshat
Aranya, Salil Gokhale, Stephen Rago, Grzegorz
Całkowski, Cezary Dubnicki, and Aniruddha
Bohra
NEC Laboratories America
Presented By : G.A.Dilruk (148209B)
 What is HYDRAstor ?
 The Research Problem
 Challenges
 The Design of HydraFS
 Evaluation
 Future Enhancements
 HYDRAstor is a content-addressable storage (CAS) system to
build storage solutions.
 Data Deduplication  High Throughput
 Multi Storage Nodes  Data Replication
 Main barrier is absence of a standard API to access
data in HYDRAstor.
 People are lazy to change existing applications.
 Applications may need to deal with unique
characteristics of CAS such as block immutability
and high latency.
 Solution to build a standard file system on top of
the HYDRAstor CAS System.
 Blocks are immutable, so data updates are more
expensive in a CAS.
 Latency of the block operations is very high.
 Cache misses for metadata blocks have a
significant impact on performance.
 Variable block size for better Deduplication.
 High throughput of sequential reads and writes.
 Minimize the number of dependent I/O
operations.
 Availability guarantees of HydraFS must be no
worse than standard Unix file system.
 File system must efficiently support both local and
remote file access.
Challenge Design Strategy
Blocks are
immutable
Decouple data and meta data processing
though a log buffer and batch operation in
meta data.
High latency of the
block operations
Read cache and write buffer.
Cache misses for
metadata blocks
Fixed-size caches and use admission
control to limit the number of concurrent
operations.
Variable block size Chunking algorithm similar to Rabin
Fingerprinting
File
Server
Commit
Server
Transaction Log
Data Blocks
Super Blocks
File Operation
 Comparison of raw device and HydraFS file system
throughput for iSCSI and Hydra.
 Hydra and HydraFS write throughput with varying
duplication ratio.
 Multiple nodes for File server to make failover
transparent and automatic.
 Integrating other algorithm like Bimodal
chunking for current chucking algorithm which is
similar to Rabin fingerprinting.
 HydraFS is acceptable for secondary storage
platform for backup appliance. Strategic way to
reduce I/O latency for primary storage.
 The Hydra File system : A first approach to a
distributed file system by Benjamin Gonzalez. 24th
October 2005. Computer science Department
Loyola University Chicago, IL 60611, USA.
 https://www.necam.com/HYDRAstor/doc.cfm?t=
FAQs
HydraFS
HydraFS

HydraFS

  • 1.
    Cristian Ungureanu, BenjaminAtkin, Akshat Aranya, Salil Gokhale, Stephen Rago, Grzegorz Całkowski, Cezary Dubnicki, and Aniruddha Bohra NEC Laboratories America Presented By : G.A.Dilruk (148209B)
  • 2.
     What isHYDRAstor ?  The Research Problem  Challenges  The Design of HydraFS  Evaluation  Future Enhancements
  • 3.
     HYDRAstor isa content-addressable storage (CAS) system to build storage solutions.  Data Deduplication  High Throughput  Multi Storage Nodes  Data Replication
  • 4.
     Main barrieris absence of a standard API to access data in HYDRAstor.  People are lazy to change existing applications.  Applications may need to deal with unique characteristics of CAS such as block immutability and high latency.  Solution to build a standard file system on top of the HYDRAstor CAS System.
  • 5.
     Blocks areimmutable, so data updates are more expensive in a CAS.  Latency of the block operations is very high.  Cache misses for metadata blocks have a significant impact on performance.  Variable block size for better Deduplication.
  • 7.
     High throughputof sequential reads and writes.  Minimize the number of dependent I/O operations.  Availability guarantees of HydraFS must be no worse than standard Unix file system.  File system must efficiently support both local and remote file access.
  • 8.
    Challenge Design Strategy Blocksare immutable Decouple data and meta data processing though a log buffer and batch operation in meta data. High latency of the block operations Read cache and write buffer. Cache misses for metadata blocks Fixed-size caches and use admission control to limit the number of concurrent operations. Variable block size Chunking algorithm similar to Rabin Fingerprinting
  • 10.
  • 11.
     Comparison ofraw device and HydraFS file system throughput for iSCSI and Hydra.
  • 12.
     Hydra andHydraFS write throughput with varying duplication ratio.
  • 13.
     Multiple nodesfor File server to make failover transparent and automatic.  Integrating other algorithm like Bimodal chunking for current chucking algorithm which is similar to Rabin fingerprinting.  HydraFS is acceptable for secondary storage platform for backup appliance. Strategic way to reduce I/O latency for primary storage.
  • 14.
     The HydraFile system : A first approach to a distributed file system by Benjamin Gonzalez. 24th October 2005. Computer science Department Loyola University Chicago, IL 60611, USA.  https://www.necam.com/HYDRAstor/doc.cfm?t= FAQs