Hydra Block Store
Hydra Block Store provides one pool of content addressable blocks, thus allowing
duplicate elimination to work across all the storage nodes in the pool. It supports
dynamic node additions and removals, is
resilient to multiple disk and node failures, and can recover from such failures
without administrator intervention. It is optimized for providing high throughput
read and write access for single or multiple concurrent streams.
See HYDRAstor.
Hydra File System
HydraFS is a file system using Hydra Block Store as persistent storage, and
is designed for high-throughput streaming workloads. The combination of
block immutability, high latency of I/O operations, and high bandwidth
requirements pose interesting challenges for the architecture, design, and
implementation of the file system.
See HYDRAstor.
Distributed Load Balancing File System
The DLBFS project is aimed at extending HydraFS to a distributed file
system, that is capable not only of fail-over but also of non-disruptive
dynamic migration of mount points for load-balancing. As in the case of
HydraFS, the nature of content-addressable storage makes this problem
differ in important ways from those typically encountered in distributed
file systems.
See HYDRAstor.
Content Defined Chunking
Using content-defined chunking in HYDRAstor brings two challenges. The first
concerns the speed. For achieving in-line deduplication at high throughputs,
chunking needs to be both fast and efficient.
The second concerns the duplicate elimination induced by the chunking process.
We are investigating algorithms capable of producing larger average chunk sizes
while retaining the duplicate elimination ratios achievable with smaller
chunks.
See HYDRAstor.
Deduplicated Primary Storage
The Hydra File System is optimized for high-throughput
streaming read and write operations. However, for metadata-intensive workloads
its performance is quite poor due to the high latency of block store operations.
This work uses solid-state disks (SSDs) to absorb the latency cost of
metadata-intensive operations, enabling the Hydra File System to perform well
enough to be used as primary storage.
See HYDRAstor.
Energy Efficiency of Distributed Storage Systems
The goal of this project is to reduce energy
consumption in distributed storage through autonomic and adaptive
placement and caching of data. We are investigating how to best leverage
SSDs for this purpose, as well as algorithms for data and metadata
placement and caching based on observed access patterns. This work also
involves developing simulation tools to help evaluate the algorithms.
Quality of Service in Distributed Storage Systems
One of the challenges in sharing a
storage system is that one user's activity interferes with that of
other users. The goal of this project is to provide mechanisms through
which the storage system can provide quality of service guarantees to
multiple users in a way that allows efficient utilization of
resources.
Solid State Drives
Due to the many desirable features of SSDs, we are
investigating possible uses of them in many of our other projects: keeping
key metadata in the Hydra Block Store, providing temporary storage for the
filesystem in the Primary Storage project, indirection maps for helping
data placement and caching algorithms for energy efficiency, etc. Their
use poses interesting problems, from simple optimizations of fundamental
data structures that take advantage of their unique characteristics, to
potential redesign of the storage stack for SSD storage.
|