![]() ![]() |
|
Department of Grid Storage
HYDRAstorOverview This research is aimed at providing a scale-out storage platform based on a
community of nodes operating as a single system providing a set of data
management services. The research led to a product launched by NEC,
available in the US and Japan. More information is available on
the HYDRAstor home page at NEC
America. ProjectsHydra Block StoreHydra Block Store provides one pool of content addressable blocks, thus allowing duplicate elimination to work across all the storage nodes in the pool. It supports dynamic node additions and removals, is resilient to multiple disk and node failures, and can recover from such failures without administrator intervention. It is optimized for providing high throughput read and write access for single or multiple concurrent streams. Hydra File SystemHydraFS is a file system using Hydra Block Store as persistent storage, and is designed for high-throughput streaming workloads. The combination of block immutability, high latency of I/O operations, and high bandwidth requirements pose interesting challenges for the architecture, design, and implementation of the file system. Distributed Load Balancing File SystemThe DLBFS project is aimed at extending HydraFS to a distributed file system, that is capable not only of fail-over but also of non-disruptive dynamic migration of mount points for load-balancing. As in the case of HydraFS, the nature of content-addressable storage makes this problem differ in important ways from those typically encountered in distributed file systems. Content Defined ChunkingUsing content-defined chunking in HYDRAstor brings two challenges. The first concerns the speed. For achieving in-line deduplication at high throughputs, chunking needs to be both fast and efficient. The second concerns the duplicate elimination induced by the chunking process. We are investigating algorithms capable of producing larger average chunk sizes while retaining the duplicate elimination ratios achievable with smaller chunks. We developed algorithms that achieve 2-4 times larger average chunks for comparable duplicate elimination, by using knowledge about input stream properties, and relying on the ability of the Hydra Block Store to quickly answer queries about the existence of already stored chunks. Primary StorageThe Hydra File System is optimized for high-throughput streaming read and write operations. However, due to the high latency of block store operations, it exhibits poor performance for metadata-intensive workloads. This work uses solid-state drives (SSDs) to absorb the latency cost of metadata-intensive operations, enabling the Hydra File System to perform well enough to be used as primary storage. |
|